2510.23937

Model: gemini-2.0-flash

# Optimized Loudspeaker Panning for Adaptive Sound-Field Correction and Non-stationary Listening Areas **Authors**: Yuancheng Luo \expresspapernumber 65 \correspondence Yuancheng Luoluoyuancheng@gmail.com \lastnames Luo {onecolabstract} Surround sound systems commonly distribute loudspeakers along standardized layouts for multichannel audio reproduction. However in less controlled environments, practical layouts vary in loudspeaker quantity, placement, and listening locations / areas. Deviations from standard layouts introduce sound-field errors that degrade acoustic timbre, imaging, and clarity of audio content reproduction. This work introduces both Bayesian loudspeaker normalization and content panning optimization methods for sound-field correction. Conjugate prior distributions over loudspeaker-listener directions update estimated layouts for non-stationary listening locations; digital filters adapt loudspeaker acoustic responses to a common reference target at the estimated listening area without acoustic measurements. Frequency-domain panning coefficients are then optimized via sensitivity / efficiency objectives subject to spatial, electrical, and acoustic domain constraints; normalized and panned loudspeakers form virtual loudspeakers in standardized layouts for accurate multichannel reproduction. Experiments investigate robustness of Bayesian adaptation, and panning optimizations in practical applications. 1 Introduction Surround sound systems for multichannel audio reproduction have risen in popularity in home theater setups that accommodate proper loudspeaker selection, layout, acoustic room treatment, and calibration established by the international telecommunication union (ITU) standards [1]. Conversely, the same accommodations present a barrier to entry for extemporary arrangements where loudspeakers differ in quality and placement, and operate in changeable listening locations / areas, and reverberant environments. Deviating from the standards degrade accurate reproduction of multichannel audio content as intended by the content authors. Therefore, methods from sound-field control and reconstruction correct for the effects of irregular loudspeaker placements and room reverberation in the listening area via acoustic measurement system inversion [2, 3, 4], and modal / planewave decomposition [5, 6, 7]; such methods however are inapplicable when acoustic measurements remain unavailable. In the absence of acoustic measurements, other sensing modalities can infer the loudspeaker layout and listening area location. Inertial measurement unit [8, 9] and bluetooth low energy [10, 11] indoor tracking can estimate changes in loudspeaker position and orientation. Ultrasound [12], camera, and video can track in-room listener and loudspeaker positions within fields-of-view. Such meta-data yields a 2D layout of the estimated loudspeaker placements, listening location, and a front direction. We therefore reproduce multichannel content at the listener’s area by incorporating Bayesian uncertainty of the estimated layout inputs with loudspeaker distance and orientation normalization [13, 14] to the listener, and then reformulate conventional amplitude panning methods [15, 16, 17] in terms of constrained optimization along joint spatial [18, 19], electrical [20], and acoustical [21] domains. The paper is organized as follows: Section 2 introduces our normalization method for aligning loudspeaker acoustic transfer functions in an arbitrary layout to a common axial-reference target at the listener location; acoustic delay and attenuation compensate for varying loudspeaker-listener distances whereas minimum-phase and all-pass factorizations [22] normalize for loudspeaker orientations relative to listener locations. We integrate estimates of the loudspeaker-listener normalization directions via Bayesian posterior updates of a novel circular distribution conjugate prior, and provide a sample calibration for a sequence of normalization angles. Section 3 presents our novel normalized loudspeaker panning optimization, which solves for frequency-dependent magnitude-gains that satisfy spatial vector-bases, electrical headroom, and acoustic power constraints; we augment the former vector-base amplitude panning with slack (VBAPS) to accommodate constraints in electric and acoustic domains. Next, we derive a panning sensitivity / efficiency objective from the augmented form that measures panned-source discreteness, and give equivalent primary and null-space formulations in fewer variables. Planewave acoustic covariances model anechoic to diffuse-field assumptions for variable sized listening areas. Optimal solutions are found via second-order cone program [23]. Section 4 applies our model to several practical applications of loudspeaker correction under varying constraints. For high loudness targets, we find optimal gains across loudspeakers for overdriven content that maximize source discreteness. For anechoic to diffuse-field environments, we show that our panning optimization solutions converge from discrete panning to Rayleigh quotient maximizers [24]. For circular-panning over varying loudspeaker layouts, we evaluate panning sensitivity across azimuth steering-angles and recommend preferred layouts for different number of loudspeakers. Section 5 discusses results and future work. 2 Loudspeaker Normalization Let $S(\nu,\theta)$ be the loudspeaker’s electrical-acoustical transfer function at frequency $\nu$ measured at $1$ meter distance along azimuth $\theta$ (radians) in the horizontal plane, with the acoustic path-delay removed. Under far-field assumptions, the loudspeaker frequency response attenuates by the inverse-distance and undergoes pure-delay. It is useful to express the far-field transfer function along a listener-centric coordinate frame, which centers the origin at the listener’s location and aligns the $+x$ axis with the listener’s facing direction. The acoustic transfer function $H_{n}(\kappa,\bm{r})$ at coordinate $\bm{r}∈\mathbb{R}^{2× 1}$ for the $n^{th}$ loudspeaker located at coordinate $\bm{u}_{n}∈\mathbb{R}^{2× 1}$ with the orientation unit-vector $\bm{o}_{n}∈\mathbb{R}^{2× 1}$ follows $$ \begin{split}H_{n}(\nu,\bm{r})&=S\left({\nu,\theta_{n}(\bm{r})}\right)\frac{e^{\minus j\kappa\left\lVert\bm{s}_{n}(\bm{r})\right\rVert}}{\left\lVert\bm{s}_{n}(\bm{r})\right\rVert},\quad\kappa=\frac{2\pi\nu}{c},\\ \theta_{n}(\bm{r})&=\cos^{-1}\left({\frac{\bm{o}_{n}^{T}\bm{s}_{n}(\bm{r})}{\left\lVert\bm{s}_{n}(\bm{r})\right\rVert}}\right),\quad\bm{s}_{n}(\bm{r})=\bm{r}-\bm{u}_{n},\end{split} \tag{1} $$ where $\kappa$ is the angular wavenumber, $c$ is the speed of sound in meters/second, $\bm{s}_{n}(\bm{r})$ is the evaluation direction relative to the loudspeaker’s location, and $\theta_{n}(\bm{r})$ is the evaluation angle relative to the loudspeaker’s orientation. We can normalize the loudspeaker’s transfer function to approximate the original loudspeaker’s response $S(\nu,\theta)$ within a listening window at the listener’s location $\bm{r}=\bm{0}$ . Consider the following decomposition of the loudspeaker transfer function $S(\nu,\theta)=S_{E}(\nu)S_{A}(\nu,\theta)$ into acoustical and electrical domain transfer functions $S_{A}(\nu,\theta)$ and $S_{E}(\nu)$ respectively. A filter with frequency response $G_{n}(\nu)$ that normalizes (1) to the loudspeaker’s on-axis response $H_{n}(\nu,\bm{0})G_{n}(\nu)=S(\nu,0)$ is given by $$ \begin{split}G_{n}(\nu)=Q_{n}(\nu)\left\lVert\bm{u}_{n}\right\rVert e^{j\kappa\left\lVert\bm{u}_{n}\right\rVert},\quad Q_{n}(\nu)=\frac{S_{A}(\nu,0)}{S_{A}\left({\nu,\bm{\bar{\theta}}_{n}}\right)},\end{split} \tag{2} $$ where $\bm{\bar{\theta}}_{n}=\theta_{n}(\bm{0})$ is the normalization angle between the loudspeaker’s orientation and the listener. The electrical domain term $S_{E}(\nu)$ cancels within the quotient $Q_{n}(\nu)$ in (2), thereby negating prior signal processing in loudspeaker playback. $Q_{n}(\nu)$ is therefore the acoustic relative-transfer-function between loudspeaker’s axial and listener-direction acoustic responses. Moreover, if $S_{A}(\nu,\theta)$ share a common acoustic delay and the remainder is minimum-phase for bounded $\theta$ that define a listening window, then $Q_{n}(\nu)$ must also be minimum-phase. Thus, the normalized transfer function $G_{n}(\nu)$ compensates for both loudspeakers’ orientation and distance relative to the listener as shown in Fig. 1. <details> <summary>figs/pretransform.png Details</summary> ![cc3e9781](/v1/image/cc3e9781ddff15f117c22f51946fea8b9b05400a69f1dc25d95e24c7b7346d8e) ### Visual Description ## Diagram: Sound Source Localization ### Overview The image is a diagram illustrating a sound source localization scenario. It depicts two speakers, a listener, and various vectors representing sound propagation and spatial relationships. The diagram includes labels for vectors, angles, and distances, providing a visual representation of the parameters involved in sound source localization. ### Components/Axes * **Speakers:** Two speaker icons are present, one black on the left and one blue in the center. * **Listener:** A silhouette of a person's head is enclosed in a circle on the right. * **Vectors:** Several labeled vectors are shown, including *un*, *vn*, *on*, and *sn(r)*. * **Angle:** The angle *θn(r)* is marked with a red arc. * **Distance:** A distance of "1 meter" is indicated. * **Circles:** Two dotted circles are centered on the speakers, representing a fixed distance (likely 1 meter). * **Labels:** * *un*: Green line connecting the black speaker to the bottom of the listener circle. * *vn* = *un* / ||*un*||: Located between the two speakers. * *on*: Blue line connecting both speakers to the listener. * *sn(r)* = *r* - *un*: Green line connecting the black speaker to the bottom of the listener circle. * *θn(r)*: Red arc representing the angle between the blue speaker and the listener. * *r* = 0: Located near the listener. * *SA(ν, θ = 0)*: Located above the listener. * *SA(ν, θn(r))* : Located near the listener. ### Detailed Analysis or Content Details * **Vector *un***: A green vector originates from the black speaker on the left and extends to a point near the listener. * **Vector *vn***: This vector, defined as *un* / ||*un*||, is positioned between the two speakers. A horizontal line connects the blue speaker to the equation. * **Vector *on***: A blue vector originates from each speaker and extends towards the listener. * **Vector *sn(r)***: A green vector is defined as *r* - *un*. * **Angle *θn(r)***: A red arc indicates the angle between the vector from the blue speaker to the listener and a horizontal line extending from the blue speaker to the listener. * **Listener Position *r***: The listener's position is defined as *r* = 0. * **Sound Source Functions *SA***: Two sound source functions are defined: *SA(ν, θ = 0)* and *SA(ν, θn(r))*. * **Distance:** The distance between the speakers and the listener is implied to be related to the 1-meter circle radius. ### Key Observations * The diagram illustrates the geometric relationships between sound sources (speakers) and a listener. * Vectors represent sound propagation paths and spatial relationships. * The angle *θn(r)* represents the angular displacement of the listener relative to the blue speaker. * The functions *SA(ν, θ = 0)* and *SA(ν, θn(r))* likely represent sound source characteristics at different angles. ### Interpretation The diagram provides a visual representation of the parameters involved in sound source localization. It shows how vectors, angles, and distances are used to model the spatial relationships between sound sources and a listener. The equations and labels provide a mathematical framework for understanding the sound propagation and localization process. The diagram suggests a scenario where the listener is at the origin (*r* = 0), and the sound sources are positioned around them. The angle *θn(r)* is a key parameter in determining the direction of the sound source relative to the listener. The functions *SA(ν, θ = 0)* and *SA(ν, θn(r))* likely represent the sound intensity or spectral characteristics at different angles, which can be used to localize the sound source. </details> Figure 1: Acoustic transfer function $G_{n}(\nu)$ in (2) normalizes the direct acoustic path between the listener and loudspeaker at $\bm{u}_{n}$ to be its on-axis response $S_{A}(\nu,0)$ at the normalized coordinate $\bm{v}_{n}$ . In practice, we can find the rational function approximation [25, 26] to $Q_{n}(\nu)$ , expressed in terms of minimum-phase $\mathbb{M}_{n}(\nu)$ and all-pass $\mathbb{A}_{n}(\nu)$ transfer functions given by $$ \begin{split}Q_{n}(\nu)\approx\mathbb{M}_{n}(\nu)\mathbb{A}_{n}(\nu),\quad\mathbb{A}_{n}(\nu)=\bar{\mathbb{A}}_{n}(\nu)\ddot{\mathbb{A}}_{n}(\nu),\end{split} \tag{3} $$ where $\bar{\mathbb{A}}_{n}(\nu)$ and $\ddot{\mathbb{A}}_{n}(\nu)$ are all-pass transfer functions belonging to stable and unstable components respectively. The unstable all-pass $\ddot{\mathbb{A}}_{n}(\nu)$ contains the reciprocal poles and zeros of the Padé approximant outside the complex unit-circle, and is ideally empty or low-order for $\theta$ in the listening window. We can realize a causal-stable filter-response $G_{n}(\nu)$ for an all-passed loudspeaker transfer function in (2) as follows: $$ \begin{split}H_{n}(\nu,\bm{0})G_{n}(\nu)=S(\nu,0)\frac{e^{\minus j\kappa d}}{\ddot{\mathbb{A}}_{lcm}(\nu)}\quad\Rightarrow\quad\\ G_{n}(\nu)=\mathbb{M}_{n}(\nu)\bar{\mathbb{A}}_{n}(\nu)\frac{\ddot{\mathbb{A}}_{n}(\nu)}{\ddot{\mathbb{A}}_{lcm}(\nu)}\left\lVert\bm{u}_{n}\right\rVert e^{j\kappa\left\lVert\bm{u}_{n}\minus d\right\rVert},\\ \end{split} \tag{4} $$ where $d=\max_{1≤ n≤ N}\left\lVert\bm{u}_{n}\right\rVert$ is the furthest loudspeaker distance, and $\ddot{\mathbb{A}}_{lcm}(\nu)$ is the transfer function of the set of least common multiple (LCM) reciprocal poles and zeros across the unstable all-passes $\left\{{\ddot{\mathbb{A}}_{1}(\nu),...,\ddot{\mathbb{A}}_{N}(\nu)}\right\}$ . In the $z$ -domain, we can therefore express the all-pass and LCM transfer functions as follows: $$ \begin{split}\ddot{\mathbb{A}}_{n}(z)&=\prod_{p\in P_{n}}\left({\frac{1-p^{*}z}{1-pz^{\minus 1}}}\right)^{k_{pn}},\quad P_{n}=\left\{{p_{1n},\ldots,p_{M_{n}n}}\right\},\\ \ddot{\mathbb{A}}_{lcm}(z)&=\prod_{p\in P}\left({\frac{1-p^{*}z}{1-pz^{\minus 1}}}\right)^{\max\limits_{1\leq n\leq N}k_{pn}},\quad P=\cup_{n=1}^{N}P_{n},\end{split} \tag{5} $$ where $p^{*}$ is the conjugate transpose, and $P_{n}$ is the set of unique poles and $k_{pn}$ is the multiplicity of pole $p$ for the $n^{th}$ loudspeaker. By taking the maximum multiplicity for each unique and unstable pole across all $\ddot{\mathbb{A}}_{n}(z)$ , and dividing by the subsequent LCM $\ddot{\mathbb{A}}_{lcm}(z)$ , the unstable poles in $\ddot{\mathbb{A}}_{n}(z)$ cancel and the remaining all-pass adds minimal additional group-delay in $G_{n}(\nu)$ . The filtered loudspeakers’ direct paths are thus matched with a common all-passed on-axis response. Lastly, we gain the loudspeaker filter $G_{n}(\nu)$ to match the expected acoustic power at a common distance $D$ , such as the median of all loudspeakers-to-listener distances, via the following room acoustic attenuation model: Let us consider the inverse-distance law $\rho_{DP}(r)=\bar{\rho}r^{\minus 2}$ for the attenuation of the direct acoustic path response’s nominal power $\bar{\rho}$ at distance $r$ from a loudspeaker. In a room environment, let $\rho_{IP}(r)$ be the total power of indirect acoustic paths at distance $r$ . We can model the ratio of the direct-to-indirect acoustic path’s power at $r$ and total power as follows: $$ \begin{split}\frac{\rho_{DP}(r)}{\rho_{IP}(r)}&=\left({\frac{d_{c}}{r}}\right)^{2\beta},\quad\beta=10^{\frac{\gamma\textrm{ dB/dd}}{10}},\quad\textrm{Attenuation rate}\\ \rho(r)&=\rho_{DP}(r)+\rho_{IP}(r)=\bar{\rho}r^{\minus 2}\left({1+\left({\frac{d_{c}}{r}}\right)^{2\beta}}\right),\end{split} \tag{6} $$ where $d_{c}$ is the so-called critical distance (meters) where the direct and indirect acoustic powers are equivalent, and $\beta$ a decay-rate parameterized by $\gamma$ decibels (dB) per double-distance (dd); typical $\gamma∈\left\{{0,-3}\right\}$ and $0.5≤ d_{c}≤ 1.5$ span idealized concert-hall to small-room spaces [27]. Normalizing the power at distance $r$ to $D$ therefore follows $$ \begin{split}F(r,\,D,\,d_{c})=\sqrt{\frac{\rho(D)}{\rho(r)}}=\frac{r}{D}\sqrt{\frac{d_{c}^{2\beta}+D^{2\beta}}{d_{c}^{2\beta}+r^{2\beta}}},\end{split} \tag{7} $$ whereby substituting $\left\lVert\bm{u}_{n}\right\rVert$ with $F(\left\lVert\bm{u}_{n}\right\rVert,\,D,\,d_{c})$ in (4) compensates for loudspeaker distances to the listener in a room. Model Uncertainty for Non-stationary Targets: In instances where the listener’s location changes over time or require online estimation, we normalize the loudspeaker via the mean listener distance $\frac{1}{T}∈t_{0}^{T}\left\lVert\bm{u}_{n}(t)\right\rVert dt$ , and treat the normalization angle $\bm{\bar{\theta}}_{n}$ relative to the loudspeaker orientation $\bm{o}_{n}$ in (2) as a random variable. The target transfer function $G_{n}(\nu)$ and quotient term $Q_{n}(\nu)$ are re-defined to minimize the expected squared-differences between the anechoic responses $S_{A}(\nu,\theta)$ sampled over axial-centered and loudspeaker-listener centered circular probability distribution functions (PDFs) $f_{0}(\theta)$ and $f_{n}(\theta)$ , $∀ 1≤ n≤ N$ respectively; circular PDFs satisfy $f(\theta)=f(\theta+2\pi k)$ , $∀ k∈\mathbb{Z}$ . We present two acoustic averages: $$ \begin{split}\bar{S}_{A}(\nu,f(\theta))&=\mathbb{E}\left[{S_{A}(\nu,\theta)}\right]=\int S_{A}(\nu,\theta)f(\theta)d\theta,\\ \hat{S}_{A}(\nu,f(\theta))&=\mathbb{E}\left[{\left|{S_{A}(\nu,\theta)}\right|^{2}}\right]=\int\left|{S_{A}(\nu,\theta)}\right|^{2}f(\theta)d\theta,\end{split} \tag{8} $$ where $\bar{S}_{A}(\nu,f(\theta))$ and $\hat{S}_{A}(\nu,f(\theta))$ are spatial windowed averages of the acoustic response and power respectively; axial window response average $\bar{S}_{A}(\nu,f_{0}(\theta))$ and power average $\hat{S}_{A}(\nu,f_{0}(\theta))$ sample from the $f_{0}(\theta)$ distribution. The modified quotient term $Q_{n}(\nu)$ in (2) is replaced with the weighted least-squares minimizer of $\operatorname*{arg\,min}_{X}∈t\left|{S_{A}(\nu,\theta)X-\bar{S}_{A}(\nu,f_{0}(\theta))}\right|^{2}f_{n}(\theta)d\theta$ given by $$ \begin{split}\bar{Q}_{n}(\nu)=\bar{S}_{A}(\nu,f_{0}(\theta))\frac{\bar{S}_{A}^{*}(\nu,f_{n}(\theta))}{\hat{S}_{A}(\nu,f_{n}(\theta))},\end{split} \tag{9} $$ where $\bar{S}_{A}^{*}(\nu,f_{n}(\theta))$ is the conjugate transpose, and $\bar{Q}_{n}(\nu)$ accounts for both amplitude and phase differences in the averaged responses. The analogous quotient for the spatial windowed acoustic power average follows $$ \begin{split}\hat{Q}_{n}(\nu)&=\sqrt{\frac{\hat{S}_{A}(\nu,f_{0}(\theta))}{\hat{S}_{A}(\nu,f_{n}(\theta))}},\end{split} \tag{10} $$ where $\hat{Q}_{n}(\nu)$ has zero-phase and therefore compensates for only the amplitude. Both quotients can be efficiently evaluated if $f_{0}(\theta)$ , $f_{n}(\theta)$ are both uni-modal and smooth over azimuth, have expansions along a common orthogonal basis with $S_{A}(\nu,\theta)$ , and follow the contours of a listening window. Let us consider the circular distribution $f(\theta)$ defined by the squared-exponential of the chordal distance $d(\theta)$ on a unit-disk, which along with $S_{A}(\nu)$ has a series-expansion over the Legendre polynomials [28], and normalized over the domain of all azimuth angles $-\pi≤\theta≤\pi$ : $$ \begin{split}f(\theta)&=\frac{e^{\frac{\minus d^{2}(\theta\minus\mu)}{2\ell^{2}}}}{2\pi e^{\minus\ell^{\minus 2}}J_{0}(j\ell^{\minus 2})},\quad d(\theta)=2\sin\left({\frac{{\theta}\textrm{ mod }{2\pi}}{2}}\right),\end{split} \tag{11} $$ where $J_{0}$ is the Bessel function of the first kind, $\mu$ is the mean azimuth, and $\ell$ is the dispersion. The function is symmetric w.r.t. its maximum $f(\mu)$ and minimum $f(\mu±\pi)$ , infinitely differentiable in all azimuths, and its percentiles computable via series expansion in appendix (26). Large dispersion $\ell$ gives a uniform distribution as $\lim_{\ell→∞}f(\theta)=(2\pi)^{\minus 1}$ ; small dispersion gives the dirac distribution as $\lim_{\ell→ 0}f(\theta-\mu)=\delta$ . We can bound the dispersion via design parameters characterizing a listening window’s peak such as the full-width at half-maximum (FWHM) measure: $$ \begin{split}\frac{f(\mu)}{2}&=f\left({\mu\pm\frac{\textrm{FWHM}}{2}}\right),\quad 0\leq\textrm{FWMH}\leq 2\pi,\\ \ell&=\frac{2\sin\left({\frac{\textrm{FWMH}}{4}}\right)}{\sqrt{2\ln(2)}}\,\,\,\Rightarrow\,\,\,0\leq\ell\leq\sqrt{2/\ln(2)},\end{split} \tag{2} $$ which defines the angular width where $f(\theta)$ spans half its maximum amplitude as shown in Fig. 2. At the upper-limit FWHM $360^{\circ}$ , $f(\theta)$ contains $\left\{{60.9,\,33.2,\,22.5}\right\}\%$ of its mass within the frontal intervals $\left|{\theta-\mu}\right|≤\left\{{90,\,45,\,30}\right\}^{\circ}$ respectively. For tighter FWHM $≤ 90.22^{\circ}$ bounds, $f(\theta)$ contains the $95\%$ confidence interval in the half-space $\left|{\theta-\mu}\right|≤ 90^{\circ}$ of its mean azimuth $\mu$ . For the axial-centered PDF in (8), we set the window’s FWHM to $60^{\circ}$ where $f_{0}(\theta)=f(\theta\,|\,\mu=0,\ell=0.4396)$ . We now proceed with online adaptation of the normalization angles $\bm{\bar{\theta}}_{n}$ over time. <details> <summary>figs/circular_distribution.png Details</summary> ![895ba18f](/v1/image/895ba18f521ca2b30f223008bc2e6f996a8c0176d5d50cd506f06a433ec2a7e5) ### Visual Description ## Chart: Circular Distribution Function ### Overview The image is a plot of a circular distribution function, showing the probability density function (PDF) f(θ) as a function of θ (Degrees). The plot highlights key percentiles and the Full Width at Half Maximum (FWHM). ### Components/Axes * **Title:** Circular Distribution Function * **X-axis:** θ (Degrees), ranging from -180 to 180, with tick marks at -180, -135, -90, -45, 0, 45, 90, 135, and 180. * **Y-axis:** f(θ), ranging from 0 to 0.5, with tick marks at 0, 0.1, 0.2, 0.3, 0.4, and 0.5. * **Curve:** A bell-shaped curve representing the probability density function f(θ) PDF, centered at θ = 0. * **Horizontal Dashed Line:** Indicates the "Half-maximum" value, intersecting the y-axis at approximately 0.28-0.3. * **Vertical Dashed Lines:** Represent various percentiles of the distribution. * 2.5% Percentile (light blue) at approximately -90 degrees. * 5% Percentile (magenta) at approximately -70 degrees. * 13.9% Percentile (red) at approximately -45 degrees. * 25% Percentile (blue) at approximately -30 degrees. * 75% Percentile (blue) at approximately +30 degrees. * 86.1% Percentile (red) at approximately +45 degrees. * 95% Percentile (magenta) at approximately +70 degrees. * 97.5% Percentile (light blue) at approximately +90 degrees. * **FWHM:** A red horizontal line segment indicating the Full Width at Half Maximum, centered around 0 degrees. The endpoints of this line intersect the curve at the half-maximum value. * -FWHM/2 (red) at approximately -45 degrees. * +FWHM/2 (red) at approximately +45 degrees. ### Detailed Analysis * **f(θ) PDF:** The black curve represents the probability density function. It is symmetric around θ = 0, reaching a maximum value of approximately 0.52. * **Half-maximum:** The horizontal dashed line is at approximately f(θ) = 0.29. * **Percentiles:** The vertical dashed lines indicate the θ values corresponding to the given percentiles. * 2.5% Percentile: θ ≈ -90 degrees * 5% Percentile: θ ≈ -70 degrees * 13.9% Percentile: θ ≈ -45 degrees * 25% Percentile: θ ≈ -30 degrees * 75% Percentile: θ ≈ +30 degrees * 86.1% Percentile: θ ≈ +45 degrees * 95% Percentile: θ ≈ +70 degrees * 97.5% Percentile: θ ≈ +90 degrees * **FWHM:** The Full Width at Half Maximum spans from approximately -45 degrees to +45 degrees. ### Key Observations * The distribution is symmetric around θ = 0. * The FWHM is centered at θ = 0. * The percentiles are symmetrically distributed around θ = 0. ### Interpretation The plot illustrates a circular distribution function, likely a von Mises distribution or a similar distribution used for circular data. The symmetry around θ = 0 suggests that the mean direction is 0 degrees. The FWHM provides a measure of the spread or concentration of the distribution. The percentiles give an indication of the distribution's shape and how the data is distributed around the mean direction. The plot is useful for visualizing and understanding the properties of circular data. </details> Figure 2: Circular distribution prior (FWHM $90.2^{\circ}$ ) contains $\left\{{50,90,95}\right\}\%$ of normalization angles within $\left|{\theta}\right|≤\left\{{27.4,\,72,\,90}\right\}^{\circ}$ of the mean angle. Suppose we have measured a normalization angle $\bar{\theta}$ belonging to the $n^{th}$ loudspeaker with known measurement dispersion $\bar{\ell}$ such that the likelihood function $f(\bar{\theta}\,|\,\mu=\bm{\bar{\theta}}_{n},\ell=\bar{\ell})$ follows the squared-exponential chordal function in (11). Let the unknown normalization angle $\bm{\bar{\theta}}_{n}$ of the $n^{th}$ loudspeaker have a squared-exponential chordal prior-distribution $f(\bm{\bar{\theta}}_{n}|\,\mu=\mu_{n},\ell=\ell_{n})$ with initial hyperparameters $\mu_{n}=0$ mean azimuth and $\ell=\sqrt{2/\ln(2)}$ maximum dispersion. The posterior normalization angle therefore has a conjugate distribution with hyperparameters following appendix (27). Over multiple time-steps $t$ , the likelihood, prior, and posterior functions across measured angles $\bm{\bar{\theta}}^{\left\{{t}\right\}}_{n}$ with dispersion $\bar{\ell}_{n}^{\left\{{t}\right\}}$ are given by $$ \begin{split}L\left({\bm{\bar{\theta}}_{n}\,|\,\bm{\bar{\theta}}^{\left\{{t}\right\}}_{n}}\right)&=f\left({\bm{\bar{\theta}}^{\left\{{t}\right\}}_{n}\,|\,\mu=\bm{\bar{\theta}}_{n},\ell=\bar{\ell}_{n}^{\left\{{t}\right\}}}\right),\quad\textrm{Likelihood}\\ P(\bm{\bar{\theta}}_{n})&=f\left({\bm{\bar{\theta}}_{n}\,|\,\mu=\mu_{n}^{\left\{{t\minus 1}\right\}},\,\ell=\ell_{n}^{\left\{{t\minus 1}\right\}}}\right),\quad\textrm{Prior}\\ P\left({\bm{\bar{\theta}}_{n}\,|\,\bm{\bar{\theta}}^{\left\{{t}\right\}}_{n}}\right)&\propto L\left({\bm{\bar{\theta}}_{n}\,|\,\bm{\bar{\theta}}^{\left\{{t}\right\}}_{n}}\right)P(\bm{\bar{\theta}}_{n}),\qquad\textrm{Posterior}\end{split} \tag{13} $$ where the reported normalization angle $\bm{\bar{\theta}}^{\left\{{t}\right\}}_{n}$ is a point-estimate taken within a measurement session, and the dispersion $\bar{\ell}_{n}^{\left\{{t}\right\}}$ is proportional to the point-estimate’s confidence interval. Both quantities can vary over time as the listener’s location may change between sessions (e.g. different seating), and measured under different noise conditions. The initial hyperparameters for mean $\mu_{n}^{\left\{{0}\right\}}=0$ and dispersion $\ell_{n}^{\left\{{0}\right\}}=0.6515$ (FWHM $90.22^{\circ}$ ) are informative as loudspeakers generally orient towards the intended listening area. The posterior estimate of $\bm{\bar{\theta}}_{n}$ follows Bayes’ theorem, where the current mean $\mu_{n}^{\left\{{t}\right\}}$ and dispersion $\ell_{n}^{\left\{{t}\right\}}$ hyperparameters are updated from the measurement terms $\bm{\bar{\theta}}^{\left\{{t}\right\}}_{n},\bar{\ell}_{n}^{\left\{{t}\right\}}$ in the likelihood function and the previous hyperparameters $\mu_{n}^{\left\{{t-1}\right\}},\ell_{n}^{\left\{{t-1}\right\}}$ via appendix (28). Lastly, the normalization filter’s quotient terms (9), (10) are updated for PDF $f_{n}(\theta)=f(\theta\,|\,\mu=\mu_{n}^{\left\{{t}\right\}},\ell=\ell_{n}^{\left\{{t}\right\}})$ , and the filters $G_{n}(\nu)$ are re-computed. Let us step-through the following example: <details> <summary>figs/fst_window_kernel_sample_transfer_functions.png Details</summary> ![5b21ab6c](/v1/image/5b21ab6c603dac6c711e035540880db881da2904440ac29d4be6bc61a286e680) ### Visual Description ## Heatmap: Loudspeaker Acoustic Transfer Function ### Overview The image is a heatmap visualizing the acoustic transfer function of a loudspeaker, denoted as S_A(ν, θ). The heatmap displays the magnitude (in dB) as a function of frequency (Hz) and azimuth angle θ (degrees). The color scale represents the magnitude, ranging from 0 dB (yellow) to -20 dB (dark blue). ### Components/Axes * **Title:** Loudspeaker Acoustic Transfer Function S_A(ν, θ) * **X-axis:** Azimuth θ (Degrees), ranging from -180 to 180 degrees, with markers at -180, -135, -90, -45, 0, 45, 90, 135, and 180. * **Y-axis:** Frequency (Hz), on a logarithmic scale, with markers at 10^2 (100 Hz), 10^3 (1000 Hz), and 10^4 (10000 Hz). * **Colorbar (Magnitude):** Magnitude (dB), ranging from 0 dB (yellow) to -20 dB (dark blue), with markers at 0, -5, -10, -15, and -20. ### Detailed Analysis The heatmap shows the magnitude of the acoustic transfer function across different frequencies and azimuth angles. * **Low Frequencies (100 Hz):** The magnitude is relatively uniform and high (yellow, near 0 dB) across all azimuth angles. * **Mid Frequencies (1000 Hz):** The magnitude varies with azimuth angle. There are regions of higher magnitude (yellow/green) around 0 degrees and lower magnitudes (blue) at larger angles (e.g., +/- 90 to +/- 180 degrees). * **High Frequencies (10000 Hz):** The magnitude exhibits a more complex pattern with distinct lobes of high and low magnitude. The highest magnitude is concentrated around 0 degrees, with significant drops in magnitude at larger angles. ### Key Observations * The loudspeaker exhibits a relatively uniform response at low frequencies across all azimuth angles. * At higher frequencies, the response becomes more directional, with the highest magnitude concentrated around the forward direction (0 degrees azimuth). * There are distinct patterns of constructive and destructive interference at higher frequencies, resulting in lobes of high and low magnitude. ### Interpretation The heatmap provides a visual representation of the loudspeaker's acoustic characteristics. It demonstrates that the loudspeaker's performance is frequency-dependent and directional. At low frequencies, the sound is radiated relatively uniformly in all directions. However, as the frequency increases, the sound becomes more focused in the forward direction. The observed lobes at higher frequencies suggest the presence of interference effects, which are likely due to the physical dimensions and design of the loudspeaker. This information is crucial for understanding the loudspeaker's performance in different acoustic environments and for optimizing its placement and usage. </details> <details> <summary>figs/fst_window_kernel_bayes_circular_dist.png Details</summary> ![511ca76b](/v1/image/511ca76bbc4d5db0f509381e98f0db268dd0363026e84ab8b01f497c9823bd8d) ### Visual Description ## Chart: Normalization Angle Circular Distributions across Time-steps ### Overview The image is a line chart displaying the probability density functions of normalization angles across different time steps. It compares the "Ground Truth" with "Prior" distributions at time t=0, and "Likelihood" and "Posterior" distributions at times t=1, t=2, and t=3. The x-axis represents the azimuth angle in degrees, and the y-axis represents the probability density f(θ). ### Components/Axes * **Title:** Normalization Angle Circular Distributions across Time-steps * **X-axis:** * Label: Azimuth θ (Degrees) * Scale: -180 to 180 degrees, with markers at -180, -135, -90, -45, 0, 45, 90, 135, and 180. * **Y-axis:** * Label: f(θ) * Scale: 0 to 2, with markers at 0, 0.2, 0.4, 0.6, 0.8, 1, 1.2, 1.4, 1.6, 1.8, and 2. * **Legend (Top-Left):** * Blue Dashed Line: Ground Truth * Black Solid Line: t = 0: Prior * Brown Dotted Line: t = 1: Likelihood * Orange Solid Line: t = 1: Posterior * Yellow Dotted Line: t = 2: Likelihood * Gold Solid Line: t = 2: Posterior * Purple Dotted Line: t = 3: Likelihood * Purple Solid Line: t = 3: Posterior ### Detailed Analysis * **Ground Truth (Blue Dashed Line):** A vertical line at approximately 90 degrees, indicating a fixed true angle. * **t = 0: Prior (Black Solid Line):** A bell-shaped curve centered around 0 degrees, with a peak probability density of approximately 0.57. * **t = 1: Likelihood (Brown Dotted Line):** A bell-shaped curve centered around -45 degrees, with a peak probability density of approximately 0.78. * **t = 1: Posterior (Orange Solid Line):** A bell-shaped curve centered around 0 degrees, with a peak probability density of approximately 1.1. * **t = 2: Likelihood (Yellow Dotted Line):** A bell-shaped curve centered around 45 degrees, with a peak probability density of approximately 1.15. * **t = 2: Posterior (Gold Solid Line):** A bell-shaped curve centered around 67 degrees, with a peak probability density of approximately 1.7. * **t = 3: Likelihood (Purple Dotted Line):** A bell-shaped curve centered around 80 degrees, with a peak probability density of approximately 1.7. * **t = 3: Posterior (Purple Solid Line):** A bell-shaped curve centered around 90 degrees, with a peak probability density of approximately 2.1. **Trend Verification:** * The "Prior" distribution is centered at 0 degrees. * The "Likelihood" distributions shift from -45 degrees (t=1) to 45 degrees (t=2) to 80 degrees (t=3). * The "Posterior" distributions shift from 0 degrees (t=1) to 67 degrees (t=2) to 90 degrees (t=3). * The "Posterior" distributions become more concentrated around the "Ground Truth" as time progresses. ### Key Observations * The "Prior" distribution represents the initial belief about the angle. * The "Likelihood" distributions represent the information obtained from the data at each time step. * The "Posterior" distributions represent the updated belief about the angle after incorporating the data. * As time progresses, the "Posterior" distributions converge towards the "Ground Truth," indicating that the estimation of the angle improves with more data. ### Interpretation The chart illustrates a Bayesian inference process where the initial "Prior" belief is updated with new "Likelihood" information at each time step to obtain a "Posterior" estimate. The convergence of the "Posterior" distributions towards the "Ground Truth" demonstrates the effectiveness of this process in refining the estimate of the normalization angle over time. The shift in the "Likelihood" distributions indicates that the data provides increasingly accurate information about the true angle as time progresses. The "Posterior" distributions become more peaked and centered around the "Ground Truth" as more data is incorporated, indicating a reduction in uncertainty and an improvement in the accuracy of the angle estimation. </details> <details> <summary>figs/fst_window_kernel_bayes_filters.png Details</summary> ![3dbcd023](/v1/image/3dbcd02382b42ba1124a6963477db771c8fabe5437824e6eacfedeefd3ec342a) ### Visual Description ## Chart: Acoustic Windowed Power Averages and Correction Quotient ### Overview The image contains two line charts displaying acoustic windowed power averages and correction quotients as a function of frequency. The top chart shows the power averages for different time windows (prior, posterior, and axial), while the bottom chart shows the corresponding correction quotients. The x-axis represents frequency in Hz (logarithmic scale), and the y-axis represents magnitude in dB. ### Components/Axes **Top Chart:** * **Title:** Acoustic Windowed Power Averages ŜA(ν, f(θ)) * **X-axis:** Frequency (Hz), logarithmic scale with markers at 102, 103, and 104. * **Y-axis:** Magnitude (dB), linear scale with markers from -20 to 0 in increments of 5 (-20, -15, -10, -5, 0). * **Legend (Top-Left):** * Black: t = 0: Prior Window * Orange: t = 1: Posterior Window * Yellow: t = 2: Posterior Window * Purple: t = 3: Posterior Window * Blue Dashed: Axial Window ŜA(ν, f0(θ)) **Bottom Chart:** * **Title:** Acoustic Windowed Correction Quotient Q̂(ν) * **X-axis:** Frequency (Hz), logarithmic scale with markers at 102, 103, and 104. * **Y-axis:** Magnitude (dB), linear scale with markers from 0 to 10 in increments of 5 (0, 5, 10). * **Legend (Top-Left):** * Black: t = 0: Prior Window * Orange: t = 1: Posterior Window * Yellow: t = 2: Posterior Window * Purple: t = 3: Posterior Window ### Detailed Analysis **Top Chart: Acoustic Windowed Power Averages** * **Black (t=0: Prior Window):** The line starts at approximately -2 dB, remains relatively flat until around 103 Hz, and then decreases gradually to approximately -3 dB at 104 Hz. * **Orange (t=1: Posterior Window):** The line starts at approximately -2 dB, remains relatively flat until around 103 Hz, and then decreases to approximately -14 dB at 104 Hz. * **Yellow (t=2: Posterior Window):** The line starts at approximately -2 dB, remains relatively flat until around 103 Hz, and then decreases, reaching a minimum of approximately -17 dB around 5*103 Hz, before increasing slightly to approximately -15 dB at 104 Hz. * **Purple (t=3: Posterior Window):** The line starts at approximately -2 dB, remains relatively flat until around 103 Hz, and then decreases significantly, reaching a minimum of approximately -21 dB around 4*103 Hz, before oscillating to approximately -16 dB at 104 Hz. * **Blue Dashed (Axial Window):** The line starts at approximately -1 dB, remains relatively flat until around 103 Hz, and then decreases to approximately -10 dB at 104 Hz. **Bottom Chart: Acoustic Windowed Correction Quotient** * **Black (t=0: Prior Window):** The line starts at approximately 1 dB, remains relatively flat until around 103 Hz, and then increases slightly to approximately 2 dB at 104 Hz. * **Orange (t=1: Posterior Window):** The line starts at approximately 1 dB, remains relatively flat until around 103 Hz, and then increases slightly to approximately 3 dB at 104 Hz. * **Yellow (t=2: Posterior Window):** The line starts at approximately 1 dB, remains relatively flat until around 103 Hz, and then increases to approximately 4 dB at 104 Hz. * **Purple (t=3: Posterior Window):** The line starts at approximately 1 dB, increases significantly after 103 Hz, reaching a maximum of approximately 12 dB around 4*103 Hz, before oscillating to approximately 7 dB at 104 Hz. ### Key Observations * In the top chart, the power averages for all windows are similar at lower frequencies (around 102 Hz). * As frequency increases, the power averages for the posterior windows (t=1, t=2, t=3) and the axial window decrease more significantly than the prior window (t=0). * In the bottom chart, the correction quotients for the prior and posterior windows (t=0, t=1, t=2) are relatively flat and close to 0 dB until around 103 Hz. * The correction quotient for t=3 increases significantly with frequency, indicating a larger correction is needed for this window at higher frequencies. ### Interpretation The charts illustrate the effect of different windowing techniques on acoustic power averages and the corresponding correction quotients. The prior window (t=0) appears to be less affected by frequency changes compared to the posterior windows (t=1, t=2, t=3) and the axial window. The correction quotient for t=3 suggests that this window requires a more substantial correction at higher frequencies, possibly due to increased distortion or noise. The data suggests that the choice of windowing technique can significantly impact the accuracy of acoustic measurements, especially at higher frequencies. </details> Figure 3: We equalize a sample loudspeaker with acoustic responses over the horizontal plane (left) between Bayesian estimates of the normalization angle $\bm{\bar{\theta}}$ in (13) (center) and the axial windowed power average. The acoustic power averages (right) over the posterior circular distribution windows $f(\theta\,|\,\mu=\mu^{\left\{{t}\right\}},\ell=\ell^{\left\{{t}\right\}})$ update across time-steps to yield a sequence of quotient correction targets in (10). Consider the sample loudspeaker responses and sequence of estimated normalization angles in Fig. 3 where the listener is $90^{\circ}$ offset the loudspeaker axis in azimuth. At $t=0$ prior to any measurements, the normalization angle assumes a circular distribution centered on the loudspeaker axis $\mu=0$ with wide dispersion FWHM $90.22^{\circ}$ . The first estimate $\bm{\bar{\theta}}^{\left\{{1}\right\}}=-60^{\circ}$ is inaccurate with high dispersion FWHM $90^{\circ}$ as shown in the $t=1$ likelihood. Although the posterior shifts its mean halfway between the prior’s mean and estimated angle, the dispersion remains high, which gives a similar acoustic windowed power average and correction quotient to that of the prior. The second estimate $\bm{\bar{\theta}}^{\left\{{2}\right\}}=75^{\circ}$ is more accurate with lower dispersion FWHM $45^{\circ}$ . The resulting posterior shifts much closer towards the estimate at much reduced dispersion, which distinguishes its windowed power average and correction quotient from the prior. The final and most accurate estimate $\bm{\bar{\theta}}^{\left\{{3}\right\}}=90^{\circ}$ with lowest dispersion FWHM $30^{\circ}$ yields a sharp posterior near the true normalization angle, which induces comb-filter patterns in the correction quotient due to lobbing in the loudspeaker’s anechoic response in azimuth. Therefore in practice, we avoid equalizing to direct acoustic-paths by enforcing a lower-bound dispersion FWHM $45^{\circ}$ for circular distributions $f_{n}(\theta)$ when computing the correction quotients $\hat{Q}_{n}(\nu)$ . 3 Loudspeaker Panning Optimization Let $R_{n}(\nu,\bm{r})=H_{n}(\nu,\bm{r})G_{n}(\nu)$ be the acoustic response at frequency $\nu$ and coordinate $\bm{r}$ of the $n^{th}$ normalized loudspeaker in (4), and the overall response of the normalized loudspeaker array follows $$ \begin{split}Y(\nu,\bm{r})=\sum_{n=1}^{N}R_{n}(\nu,\bm{r})X_{n}(\nu),\end{split} \tag{14} $$ where $X_{n}(\nu)$ is the transfer function of the array’s weights belonging to the $n^{th}$ loudspeaker. For normalized loudspeaker panning, we constrain $X_{n}(\nu)$ to have a common phase-component (e.g. delay or all-pass) across loudspeakers and solve for the unknown magnitude components $x_{n}(\nu)=\left|{X_{n}(\nu)}\right|$ , which are subject to frequency-dependent spatial-electrical-acoustic domain constraints. The magnitude components at frequency $\nu$ are therefore expressed as a vector of panning gains $\bm{x}=\left[{x_{1},... x_{N}}\right]^{T}∈\mathbb{R}^{N× 1}$ , whereby we omit the frequency $\nu$ specification for simplifying notation. Further simplifications following the loudspeaker normalization are possible when specifying domain-specific constraints. Loudspeaker coordinates reduce to their unit-directions in the spatial domain given by $$ \begin{split}\bm{V}=\left[{\bm{v}_{1},\ldots,\bm{v}_{N}}\right]\in\mathbb{R}^{2xN},\quad\bm{v}_{n}=\frac{\bm{u}_{n}}{\left\lVert\bm{u}_{n}\right\rVert}.\end{split} \tag{15} $$ The normalization filter’s electrical gain $\left|{G_{n}(\nu)}\right|$ bounds the electrical headroom in the electrical domain. The normalized loudspeaker acoustic responses in (4) are matched at the listener’s location in the acoustical domain. Spatial Panning Constraints: The vector-base amplitude panning with slack (VBAPS) constraint is given by $$ \begin{split}\bm{V}\bm{x}=\lambda\bm{s},\quad\bm{x}\geq\bm{0},\quad\lambda\geq 0,\end{split} \tag{16} $$ where the panning gains $\bm{x}$ are non-negative as to preserve the relative-phase between loudspeaker pairs, and constrain the weighted average of the loudspeaker directions $\bm{V}$ to coincide with the target steering unit-direction $\bm{s}∈\mathbb{R}^{2× 1}$ upto non-negative scale given by the slack-variable $\lambda$ . The latter is an augmented variable for both scaling the target unit-direction $\bm{s}$ to lie in equality with the panning direction $\bm{V}\bm{x}$ as shown in Fig. 4, and to accommodate constraints placed on $\bm{x}$ from other domains. The feasible steering and panning directions, and panning gains are therefore constrained as follows: <details> <summary>figs/vbaps.png Details</summary> ![f7ebcfb0](/v1/image/f7ebcfb04642c4948a7d3ae8b4ed1960e57c9c70a9c62cacca704938bfdf09ac) ### Visual Description ## Diagram: Feasible and Infeasible Sound Source Localization ### Overview The image is a diagram illustrating the concept of feasible and infeasible sound source localization. It depicts a listener, two sound sources (speakers), a target sound source, and regions indicating where sound source localization is feasible or infeasible. The diagram uses arrows and labels to show the relationships between these elements. ### Components/Axes * **Target s:** Located at the top-center of the diagram, represented by a speaker icon. * **Listener:** Located at the bottom-center of the diagram, represented by a silhouette of a person. * **Speakers (v1, v2):** Two speakers are positioned on the left and right sides of the diagram. * **v1:** Left speaker. * **v2:** Right speaker. * **Feasible s (Blue):** The region above the dotted blue arc, indicating where sound source localization is feasible. * **Infeasible s (Yellow):** The region below the dotted yellow arc, indicating where sound source localization is infeasible. * **λs:** A purple arrow pointing from the listener to a purple dot. * **v1x1 + v2x2:** Purple text above the purple dot. * **v1x1:** Red line from the left speaker to the listener. * **v2x2:** Red line from the right speaker to the listener. * **Dotted Blue Arc:** Separates the "Feasible s" region from the "Infeasible s" region. * **Dotted Yellow Arc:** Separates the "Feasible s" region from the "Infeasible s" region. ### Detailed Analysis * **Listener to Target:** The listener is positioned below the target sound source. * **Speakers:** The two speakers are positioned on either side of the listener. * **Feasible Region:** The feasible region is above the dotted blue arc, encompassing the target sound source. The text "Feasible s" is written in blue in this region. * **Infeasible Region:** The infeasible region is below the dotted yellow arc, encompassing the listener. The text "Infeasible s" is written in yellow in this region. * **Arrows:** * Red arrows connect each speaker to the listener, labeled as "v1x1" and "v2x2" respectively. * Red arrows connect each speaker to the purple dot, labeled as "v1x1" and "v2x2" respectively. * A purple arrow connects the listener to the purple dot, labeled as "λs". * A purple arrow connects the purple dot to the target speaker. * **Purple Dot:** Located between the listener and the target speaker, with the label "v1x1 + v2x2" above it. ### Key Observations * The diagram illustrates the concept of feasible and infeasible regions for sound source localization. * The position of the listener, speakers, and target sound source are key elements in determining the feasibility of localization. * The dotted arcs visually separate the feasible and infeasible regions. * The arrows and labels indicate the relationships between the different elements in the diagram. ### Interpretation The diagram demonstrates a simplified model of sound source localization. The "Feasible s" region represents areas where the sound source can be accurately localized, while the "Infeasible s" region represents areas where localization is difficult or impossible. The position of the listener relative to the sound sources and the target plays a crucial role in determining the feasibility of localization. The variables v1, v2, x1, x2, and λ likely represent parameters related to the sound signals and spatial relationships, but without further context, their precise meaning is unclear. The purple dot likely represents a calculated or estimated location of the sound source based on the signals received by the listener. The diagram suggests that accurate sound source localization depends on the listener being positioned within a region where the sound signals from the speakers can be effectively processed. </details> <details> <summary>figs/headroom.png Details</summary> ![29b95b96](/v1/image/29b95b964c43a4b48190f99ea014c42cde43e05ff0d0d0bac1439c7cfc749998) ### Visual Description ## Diagram: Audio Source and Listener Setup ### Overview The image is a diagram illustrating the spatial arrangement of audio sources and a listener. It depicts a person (listener) at the center, surrounded by several audio sources (speakers, devices) positioned at varying distances and angles. The diagram includes distance markers and inequalities related to the positions. ### Components/Axes * **Listener:** Represented by a black silhouette of a person's head and shoulders at the center of the diagram. * **Audio Sources:** * `uL`: Speaker icon, located at the top-left, on the outer circle. * `uR`: Speaker icon, located at the top-right, on the outer circle. * `u1`: Speaker icon, located on the left, on the inner circle. * `u2`: Speaker icon, located to the left of `u1`, on the outer circle. * `uD`: Device icon (circular), located at the bottom-right, on the outer circle. * `uS`: Device icon (rectangular), located below `uD`, outside the outer circle. * **Distance Markers:** * Inner dotted circle: Labeled "1 meter". * Outer dotted circle: Labeled "2 meters". * **Lines:** * Purple lines: Connect the listener to `uL` and `uR`. * Red line: Connects the listener to `u1` and `u2`. * Green lines: Connect the listener to `uD` and `uS`. * **Inequalities:** * Near `u2`: "x1 ≤ 1, 2x2 ≤ 1" * Near `uR`: "2xL ≤ 1, 3xR ≤ 1" * Near `uS`: "xS ≤ 1, 4xD ≤ 1" ### Detailed Analysis * **Audio Source Positions:** * `uL` and `uR` are positioned on the outer circle (2 meters from the listener) at approximately 45 degrees from the vertical axis. * `u1` is positioned on the inner circle (1 meter from the listener) on the left side. * `u2` is positioned on the outer circle (2 meters from the listener) to the left of `u1`. * `uD` is positioned on the outer circle (2 meters from the listener) on the bottom-right. * `uS` is positioned outside the outer circle, below `uD`. * **Lines:** * The purple lines indicate the distance and direction from the listener to `uL` and `uR`. * The red line indicates the distance and direction from the listener to `u1` and `u2`. * The green lines indicate the distance and direction from the listener to `uD` and `uS`. * **Inequalities:** * "x1 ≤ 1, 2x2 ≤ 1": Relates to the positions of `u1` and `u2`. * "2xL ≤ 1, 3xR ≤ 1": Relates to the positions of `uL` and `uR`. * "xS ≤ 1, 4xD ≤ 1": Relates to the positions of `uS` and `uD`. ### Key Observations * The listener is centrally located. * Audio sources are positioned at different distances and angles around the listener. * The inequalities provide constraints on the positions of the audio sources. ### Interpretation The diagram illustrates a spatial audio setup, likely for research or engineering purposes. The listener is at the center, and the audio sources are positioned around them to create a specific sound field. The inequalities likely represent constraints or design parameters for the audio setup, possibly related to sound intensity, delay, or other acoustic properties. The diagram could be used to analyze or design spatial audio systems, such as those used in virtual reality, augmented reality, or sound localization applications. The different colors of the lines connecting the listener to the audio sources may represent different audio channels or signal paths. </details> Figure 4: VBAPS (left) constrains the feasible steering direction $\bm{s}$ to lie between the minor-arc of the loudspeaker pair coordinates $\bm{x}_{L},\bm{x}_{R}$ . Sample voltage constraints (right) are proportional to differences in loudspeaker-to-listener distance, orientation, and selection. Consider a set of $N$ loudspeakers and panning gains satisfying (16). The set of feasible steering unit-directions $\bm{s}$ must lie in the union of minor-arcs between all pairwise loudspeaker unit-directions. Conversely, steering directions are infeasible along the major-arc of a single loudspeaker-pair $N=2$ as shown in Fig. 4. For $N>2$ loudspeakers, the feasible $\bm{s}$ are all of $\mathbb{R}^{2}$ iff there exist a set of three loudspeakers where the negative direction of each loudspeaker lies between the minor-arc of the other two loudspeaker directions. The panning direction $\bm{V}\bm{x}$ is therefore constrained to be in the set of $\lambda$ -scaled feasible unit-directions $\bm{s}$ . We now introduce several evaluation metrics or objectives w.r.t. $\lambda$ . Let us define panning sensitivity by the acoustic-path distance ratio of the panning direction and the summation of component panning gained loudspeaker directions given by $$ \begin{split}\mathbb{S}(\bm{V},\bm{x},\bm{s})=\frac{\left\lVert\bm{V}\bm{x}\right\rVert}{\sum_{n=1}^{N}\left\lVert\bm{v}_{n}x_{n}\right\rVert}=\frac{\left\lVert\lambda\bm{s}\right\rVert}{\sum_{n=1}^{N}x_{n}}=\frac{\lambda}{\bm{x}^{T}\bm{1}},\end{split} \tag{17} $$ which has bounds $0<\mathbb{S}(\bm{V},\bm{x},\bm{s})≤ 1$ . Sensitivity is maximal iff non-zero panning gains belong to loudspeakers with directions coincident to the steering direction, large if panning gains disproportionately allocate to loudspeakers with directions closer to the steering direction, and minimal when panning gains allocate to loudspeakers with directions that sum to zero. Panning sensitivity therefore gives a similarity measure between panned and discrete sound-sources in the direction of $\bm{s}$ . This contrasts with cross-domain measures of panning efficiency, which evaluates the power ratios between panning direction and electric or acoustic gain as follows: $$ \begin{split}\mathbb{F}(\bm{K},\bm{V},\bm{x})=\frac{\bm{x}^{T}\bm{V}^{T}\bm{V}\bm{x}}{\bm{x}^{T}\bm{K}\bm{x}}=\lambda^{2}\left\lVert\bm{K}^{\frac{1}{2}}\bm{x}\right\rVert^{\minus 2},\end{split} \tag{18} $$ where $\bm{K}∈\mathbb{C}^{N× N}$ is a domain-dependent covariance matrix (identity for electrical, model dependent for acoustical). For the electrical domain where $\bm{K}=\bm{I}$ , the maximum efficiency is $N$ for loudspeakers with directions coincident to the steering direction and uniform panning gains $\bm{x}=N^{\minus 1}\bm{1}$ . For the acoustic domain, the maximum efficiency is the largest generalized eigenvalues between $\bm{V}^{T}\bm{V}$ and $\bm{K}$ . Thus, higher panning efficiency is realized via more uniformly distributed panning gains across loudspeakers, whereas high panning sensitivity follows sparsely distributed panning gains. Electrical Headroom Constraints: The electrical-power headroom of normalized loudspeakers decreases in proportion to the normalization filter power responses $\left|{G_{n}(\nu)}\right|^{2}$ . Under non-negative panning constraint, the quadratic electrical-power constraint are linearized as follows: $$ \begin{split}x_{n}x_{n}^{*}&\leq\left|{G_{n}(\nu)}\right|^{\minus 2},\quad x_{n}\geq 0,\quad\Rightarrow\quad\bm{0}\leq\bm{x}\leq\bm{\tau},\end{split} \tag{19} $$ where $\bm{\tau}=\left[{\left|{G_{1}(\nu)}\right|^{\minus 1},...,\left|{G_{N}(\nu)}\right|^{\minus 1}}\right]^{T}∈\mathbb{R}^{N× 1}_{≥ 0}$ is a vector containing the digital headroom per loudspeaker that bounds the feasible space of panning gains to the upper box-orthant. We give several examples of voltage headroom consumed by normalization in Fig. 4. Doubling the loudspeaker $\bm{u}_{1}$ ’s distance to the listener to that of $\bm{u}_{2}$ halves the voltage headroom. Re-orienting the loudspeaker $\bm{u}_{R}$ to face the listener at $\bm{u}_{L}$ lowers high-frequency headroom. Equalizing the mid-range loudspeaker at $\bm{u}_{D}$ to match the full-range loudspeaker at $\bm{u}_{S}$ decreases the low-frequency headroom. Acoustical Power Constraints: The acoustic covariances between the normalized loudspeaker transfer functions $R_{n}(\nu,\bm{r})$ in (14), over coordinates $\bm{r}$ in the listening area, specify quadratic power constraints in equality to the acoustic power target $\rho$ as follows: $$ \begin{split}\bm{x}^{T}\bm{K}\bm{x}&=\rho,\quad K_{ij}\approx\mathbb{E}_{\bm{r}\sim g(\bm{r})}\left[{R_{i}(\nu,\bm{r})R_{j}^{*}(\nu,\bm{r})}\right],\end{split} \tag{20} $$ whereby $\bm{r}$ is sampled from a disc of radius $\tau_{r}$ with a truncated uniform PDF $g(\bm{r})=\frac{1}{\pi\tau_{r}^{2}},∀\,\left\lVert\bm{r}\right\rVert≤\tau_{r}$ , and $0$ otherwise. For loudspeaker transfer functions in the far-field, spherical-waves can be approximated by plane-waves which give the acoustic covariance matrix $\bar{\bm{K}}$ with analytic terms $\bar{K}_{ij}$ as derived in appendix (31) as follows: $$ \begin{split}\bar{K}_{ij}=\left|{S(\nu,0)}\right|^{2}\left\{\begin{array}[]{cc}\frac{2J_{1}\left({D_{ij}\kappa\tau_{r}}\right)}{D_{ij}\kappa\tau_{r}},&D_{ij}\kappa\tau_{r}>0\vskip 2.00749pt\\ 1,&D_{ij}\kappa\tau_{r}=0\end{array}\right.,\end{split} \tag{21} $$ where $D_{ij}=\left\lVert\bm{v}_{i}-\bm{v}_{j}\right\rVert$ is the distance between loudspeaker unit-directions, and $J_{1}(z)$ is the Bessel function of the first kind. Note that at the listener location $\bm{r}=\bm{0}$ , the normalized loudspeaker transfer functions are constant in (4). Thus, the acoustic covariance matrix $\bar{\bm{K}}$ degenerates to the rank-1 matrix $\mathring{\bm{K}}=\left|{S(\nu,0)}\right|^{2}\bm{1}\bm{1}^{T}$ as the evaluation radius decreases to zero in $\lim_{\tau_{r}→ 0}g(\bm{r})=\delta$ . We therefore decompose the acoustic covariance as follows: Let the acoustic covariance matrix in (20) be a mixture of the listener location, and listening area covariances given by $$ \begin{split}\bm{K}=(1-\alpha)\mathring{\bm{K}}+\alpha\bar{\bm{K}},\quad 0\leq\alpha\leq 1,\end{split} \tag{22} $$ where the acoustic covariance for $\alpha=0$ evaluates only the direct acoustic transfer function from loudspeakers to the listener location. The quadratic constraints in (20) linearize to $\bm{x}^{T}\bm{1}=\sqrt{\rho}\left|{S(\nu,0)}\right|^{\minus 1}$ for non-negative $\bm{x}$ ; maximizing $\lambda$ s.t. the linear gain summation constraint maximizes the panning sensitivity. Conversely, the acoustic covariance for $\alpha=1$ evaluates the acoustic transfer functions over a larger listening area; maximizing $\lambda$ s.t. the quadratic equality constraint maximizes panning efficiency. Moreover, the loudspeaker acoustic covariances in the listening area at the limits are correlated in low-frequency $\lim_{\kappa→ 0}\bar{\bm{K}}=\mathring{\bm{K}}$ , and uncorrelated in high-frequency or large evaluation radii $\lim_{\kappa→∞}\bar{\bm{K}}=\lim_{\tau_{r}→∞}\bar{\bm{K}}=\bm{I}$ . Therefore, the mixture of covariances (22) are proportional to $\bm{K}\propto(1-\alpha)\bm{1}\bm{1}^{T}+\alpha\bm{I}$ . We now formulate the loudspeaker steering optimization w.r.t. spatial, electrical, and acoustical constraints. Optimal Panning Sensitivity and Efficiency (OPSE): Maximizing the panning sensitivity $\lambda$ subject to spatial, acoustical, and electrical constraints is the second-order cone problem [23] given by $$ \begin{split}(\lambda_{*},\bm{x}_{*})&=\arg\max_{\lambda.\bm{x}}\,\lambda\qquad\textrm{s.t.}\quad\lambda\geq 0,\\ \bm{V}\bm{x}&=\lambda\bm{s},\quad\bm{x}^{T}\bm{K}\bm{x}\leq\rho,\quad\bm{0}\leq\bm{x}\leq\bm{\tau},\end{split} \tag{23} $$ where a feasible solution always exist if the acoustic loudness’s equality constraint in (20) is relaxed to be in inequality; acoustic loudness is tight w.r.t. $\rho$ if panning sensitivity (17) or efficiency (18) is also maximized. We can eliminate $\lambda$ by left-multiplying both sides of the equality constraints in (23) by unit-direction $\bm{s}^{T}$ to yield $\lambda=\bm{s}^{T}\bm{V}\bm{x}$ , and the equality constraint matrix $\bm{A}=(\bm{I}-\bm{s}\bm{s}^{T})\bm{V}$ . The equivalent optimization in only $\bm{x}$ is expressed as follows: $$ \begin{split}\bm{x}_{*}&=\arg\max_{\bm{x}}\,\bm{c}^{T}\bm{x}\qquad\textrm{s.t.}\quad\bm{c}^{T}\bm{x}\geq 0,\\ \bm{A}\bm{x}&=\bm{0},\quad\bm{x}^{T}\bm{K}\bm{x}\leq\rho,\quad\bm{0}\leq\bm{x}\leq\bm{\tau},\end{split} \tag{24} $$ where the objective maximizes the panning gains $\bm{x}$ in the direction of vector $\bm{c}=\bm{V}^{T}\bm{s}$ , consisting of cosine similarities between the target and loudspeaker unit-directions. Moreover, the equality constraints restrict $\bm{x}$ to the null space of $\bm{A}$ , which has nullity $N-1$ . Thus for real-time applications and small number of loudspeakers $(N≤ 5)$ , we remove the equality constraints and reduce the number of variables via the linear transformation of the panning gains $\bm{x}=\bar{\bm{A}}\bm{y}$ along an orthonormal basis $\bar{\bm{A}}^{T}\bar{\bm{A}}=\bm{I}$ of the null space $\bar{\bm{A}}∈\textrm{span}\left({\textrm{ker}\left({\bm{A}}\right)}\right)∈\mathbb{R}^{N× N-1}$ . The optimization in the kernel space reduces to linear and quadratic inequality constraints given by $$ \begin{split}\bm{y}_{*}&=\arg\max_{\bm{y}}\,\bar{\bm{c}}^{T}\bm{y}\quad\textrm{s.t.}\,\,\,\begin{array}[]{c}\bar{\bm{c}}^{T}\bm{y}\geq 0,\\ \bm{0}\leq\bar{\bm{A}}\bm{y}\leq\bm{\tau},\end{array}\,\,\,\bm{y}^{T}\bar{\bm{K}}\bm{y}\leq\rho,\end{split} \tag{25} $$ where $\bar{\bm{c}}=\bar{\bm{A}}^{T}\bm{c}$ , and $\bar{\bm{K}}=\bar{\bm{A}}^{T}\bm{K}\bar{\bm{A}}$ , and the feasible region is convex. Lastly, the steering direction $\bm{s}$ can be infeasible where only the trivial solution $\bm{x}=\bm{0}$ satisfies the VBAPS equality constraint; dropping the VBAPS constraints $\bm{A}\bm{x}=\bm{0}$ and $\bm{c}^{T}\bm{x}≥ 0$ in the primary form (24) relaxes the feasible space to be convex. Therefore, optimal solutions for both the null space (25) and relaxed primary forms can be efficiently found via interior-point methods. Let us now investigate the solutions to (23), (24), (25) under various acoustic power, covariance, and loudspeaker layouts in practical applications. 4 Experiments Distributed Center Channel: In the $5.0$ multichannel standard, the center content channel is fully sent to a center loudspeaker in a $5.0$ ITU layout (left = $-30^{\circ}$ , right = $30^{\circ}$ , center = $0^{\circ}$ , surround left = $-110^{\circ}$ , surround right = $110^{\circ}$ ), where the maximum acoustic power (unity) is limited to that of a single loudspeaker. Under OPSE, we can specify a larger acoustic power target $\rho$ via the equality constraint $\bm{x}^{T}\bm{K}\bm{x}=\rho$ , spatial panning constraints of a center steering direction $\bm{s}=\left[{1;0}\right]$ , and unity electrical constraints $\bm{x}≤\bm{1}$ WLOG. The optimal panning sensitivity gains for the listener location’s acoustic covariance $\bm{K}=\bm{1}\bm{1}^{T}$ are shown in Fig. 5 for increasing acoustic power $\rho$ targets. For acoustic power targets $0<\rho≤ 1$ , only the center loudspeaker is active $0<x_{C}≤ 1$ , and panning sensitivity is maximum. For $1<\rho≤ 9$ , the center loudspeaker exhausts its headroom and the left and right loudspeakers equally engage $(0<x_{L,R}≤ 1,\,x_{C}=1)$ , resulting in a slight loss in panning sensitivity ( $0.9$ at $\rho=9$ ), and increase in both panning/electric and acoustic/electric efficiency. For $9<\rho≤ 25$ , the left and right loudspeakers exhausts their headroom and the surround loudspeakers equally engage $(0<x_{SL,SR}≤ 1,\,x_{L,R,C}=1)$ , resulting in a sharper loss to panning sensitivity and degradation to panning/electric efficiency as the center steering direction lies in the infeasible sector of the surround loudspeaker pair. Note that for inequality constraints $\bm{x}^{T}\bm{K}\bm{x}≤\rho$ , the surround panning gains remain in-active as the quadratic constraint is not tight for $\rho>9$ . Panning sensitivity therefore monotonically decreases for larger acoustic power targets. <details> <summary>figs/vary_acoustic_pow.png Details</summary> ![0440545a](/v1/image/0440545a0068bfcb8a33d2f988ab24ee0fd7c1b269eca2024de7610d9dc229f9) ### Visual Description ## Chart: Optimal Panning Gains for Varying Acoustic Power Targets ### Overview The image is a line chart that displays the relationship between acoustic power target (ρ) and various gain and efficiency metrics. The x-axis represents the acoustic power target ρ, while the left y-axis represents "Gain" and the right y-axis represents "Efficiency". Several lines represent different panning strategies and their corresponding gains or efficiencies. ### Components/Axes * **Title:** Optimal Panning Gains for Varying Acoustic Power Targets * **X-axis:** * Label: Acoustic Power Target ρ for Constraint xᵀKx = ρ, K = 11ᵀ, x ≤ 1 * Scale: 0 to 25, with major ticks at 0, 1, 9, 15, 20, and 25. * **Left Y-axis:** * Label: Gain * Scale: 0 to 1.5, with major ticks at 0, 0.5, 1, and 1.5. * **Right Y-axis:** * Label: Efficiency * Scale: 0 to 7, with major ticks at 0, 1, 2, 3, 4, 5, 6, and 7. * **Legend:** Located at the top-center of the chart. * **Sensitivity (Solid Blue):** λ/xᵀ1 * **Center (Dashed Blue):** xC * **Fronts (Dotted Blue):** xL, xR * **Surrounds (Dash-Dot Blue):** xSL, xSR * **Panning/Electric (Solid Orange):** λ²/xᵀx * **Panning/Acoustic (Dashed Orange):** λ²/xᵀKx * **Acoustic/Electric (Dotted Orange):** xᵀKx/xᵀx ### Detailed Analysis * **Sensitivity (Solid Blue):** Starts at a gain of approximately 1.0 at ρ = 0, remains constant until ρ = 1, then decreases to approximately 0.4 at ρ = 25. * **Center (Dashed Blue):** Starts at a gain of approximately 0 at ρ = 0, remains constant until ρ = 9, then increases linearly to approximately 1.0 at ρ = 25. * **Fronts (Dotted Blue):** Starts at a gain of approximately 0 at ρ = 0, increases to approximately 0.9 at ρ = 9, then increases slightly to approximately 1.0 at ρ = 25. * **Surrounds (Dash-Dot Blue):** Starts at a gain of approximately 0.2 at ρ = 0, remains constant at approximately 0.2 until ρ = 1, then increases to approximately 0.4 at ρ = 25. * **Panning/Electric (Solid Orange):** Starts at an efficiency of approximately 1.0 at ρ = 0, increases to approximately 2.5 at ρ = 1, then increases to approximately 3.0 at ρ = 9, then decreases to approximately 1.0 at ρ = 25. * **Panning/Acoustic (Dashed Orange):** Starts at an efficiency of approximately 0.2 at ρ = 0, remains constant at approximately 0.2 until ρ = 25. * **Acoustic/Electric (Dotted Orange):** Starts at an efficiency of approximately 1.0 at ρ = 0, increases to approximately 5.0 at ρ = 25. ### Key Observations * The "Sensitivity" gain decreases as the acoustic power target increases beyond a value of 1. * The "Center" gain increases linearly with the acoustic power target after a value of 9. * The "Fronts" gain increases rapidly initially and then plateaus. * The "Surrounds" gain increases linearly with the acoustic power target after a value of 1. * The "Panning/Electric" efficiency initially increases and then decreases as the acoustic power target increases. * The "Panning/Acoustic" efficiency remains relatively constant regardless of the acoustic power target. * The "Acoustic/Electric" efficiency increases steadily with the acoustic power target. ### Interpretation The chart illustrates how different panning strategies perform in terms of gain and efficiency as the acoustic power target changes. The "Sensitivity" gain decreases with increasing acoustic power, suggesting that it might be less effective at higher power targets. The "Center" gain increases, indicating that it becomes more prominent at higher power targets. The "Fronts" gain quickly reaches a plateau, suggesting a limited dynamic range. The "Surrounds" gain increases steadily, indicating a consistent contribution across different power targets. The efficiency metrics show how effectively each panning strategy utilizes power. The "Panning/Electric" efficiency peaks and then declines, suggesting an optimal power target range. The "Panning/Acoustic" efficiency remains constant, indicating a consistent power usage. The "Acoustic/Electric" efficiency increases, suggesting that it becomes more efficient at higher power targets. </details> Figure 5: OPSE center content more uniformly distributes across $5.0$ ITU loudspeakers for increasing acoustic power targets $\rho$ , and constant electrical headroom. Diffuse-field Panning: In reverberant environments, acoustic covariance between well-separated loudspeakers in the listening area decreases due to increasing variations in acoustic reflection path responses. Normalized loudspeakers produce a mixture of correlated sound-fields from their direct acoustic paths, and less correlated diffuse-fields from their reflection paths over a listening area. The acoustic covariance in the listening area is therefore proportional to (22). Let us reconsider the previous case of distributed center channel over a $3.0$ ITU layout (left = $-30^{\circ}$ , right = $30^{\circ}$ , center = $0^{\circ}$ ). Under OPSE, we constrain the acoustic power to unity $\bm{x}^{T}\bm{K}\bm{x}=1$ , relax the electrical headroom $\bm{x}≤\bm{10}$ , and vary the mixture of acoustic covariances as shown in Fig. 6. For correlated sound-fields $0≤\alpha≤ 1-\bm{s}^{T}\bm{v}_{L}$ , only the center loudspeaker is active as panning sensitivity is maximum. For less correlated sound-fields $1-\bm{s}^{T}\bm{v}_{L}<\alpha≤ 1$ , the center loudspeaker attenuates relative to the left and right loudspeakers as more uniform-distributed gains yield both higher acoustic/panning and panning/electric efficiency. The gap between acoustic/electric efficiency and its theoretical Rayleigh quotient maximum, given by the largest eigenvalue of $\bm{K}$ , closes at the diffuse-field limit $\alpha=1$ . OPSE therefore converges to the largest eigenvector of $\bm{K}$ under diffuse-field conditions where source-localization is difficult. <details> <summary>figs/vary_alpha.png Details</summary> ![a3b08de3](/v1/image/a3b08de303caa955e78be5f63c7579e18c2d72f1a70bd2c495ba997b1448ef7f) ### Visual Description ## Chart: Optimal Panning Gains for Varying Acoustic Covariances ### Overview The image is a line chart comparing different panning gain strategies against a mixture coefficient alpha. The chart plots "Gain" on the primary y-axis (left) and "Efficiency" on the secondary y-axis (right), both as functions of the "Mixture Coefficient α" on the x-axis. Several panning methods are compared, including sensitivity, center, fronts, panning/electric, panning/acoustic, and acoustic/electric. ### Components/Axes * **Title:** Optimal Panning Gains for Varying Acoustic Covariances * **X-axis:** * Label: Mixture Coefficient α for K = (1 − α)11ᵀ + αI, xᵀKx = 1, x ≤ 10 * Scale: 0 to 1, with markers at 0, 0.1340, 0.5, 0.75, and 1. * **Y-axis (Left):** * Label: Gain * Scale: 0 to 1.4, with markers at 0, 0.2, 0.4, 0.6, 0.8, 1, 1.2, and 1.4. * **Y-axis (Right):** * Label: Efficiency * Scale: 0 to 7, with markers at 0, 1, 2, 3, 4, 5, 6, and 7. * **Legend (Top-Left):** * Sensitivity (Solid Blue Line): λ/xᵀ1 * Center (Dashed Blue Line): xC * Fronts (Dotted Blue Line): xL, xR * **Legend (Top-Right):** * Panning/Electric (Solid Orange Line): λ²/xᵀx * Panning/Acoustic (Dashed Orange Line): λ²/xᵀKx * Acoustic/Electric (Dotted Orange Line): xᵀKx/xᵀx * **Additional Text:** max xᵀKx / xᵀx (Located around coordinates x=0.2, y=0.6) ### Detailed Analysis * **Sensitivity (Solid Blue Line):** Starts at a Gain of 1.0 at α = 0, remains relatively constant around 0.92 after α = 0.5. * **Center (Dashed Blue Line):** Starts at a Gain of 1.0 at α = 0, decreases to approximately 0.6 at α = 0.5, and then increases slightly to approximately 0.65 at α = 1. * **Fronts (Dotted Blue Line):** Starts at a Gain of 0 at α = 0, increases to approximately 0.2 at α = 0.5, and continues to increase to approximately 0.25 at α = 1. * **Panning/Electric (Solid Orange Line):** Starts at an Efficiency of 1 at α = 0, remains constant until α = 0.1340, then increases to approximately 2.5 at α = 0.5, and continues to increase to approximately 2.7 at α = 1. * **Panning/Acoustic (Dashed Orange Line):** Starts at an Efficiency of 1 at α = 0, remains constant until α = 0.1340, then increases to approximately 1.8 at α = 0.5, and continues to increase to approximately 2.2 at α = 1. * **Acoustic/Electric (Dotted Orange Line):** Starts at an Efficiency of 0 at α = 0, increases to approximately 2 at α = 0.5, and continues to increase to approximately 2.5 at α = 1. * **max xᵀKx / xᵀx (Dotted Magenta Line):** Starts at a Gain of approximately 0.6 at α = 0, and decreases to approximately 0.2 at α = 1. ### Key Observations * The "Sensitivity" gain remains relatively stable across the range of mixture coefficients. * The "Center" gain decreases initially and then slightly increases. * The "Fronts" gain consistently increases with the mixture coefficient. * The "Panning/Electric," "Panning/Acoustic," and "Acoustic/Electric" efficiencies all increase with the mixture coefficient. * The "max xᵀKx / xᵀx" gain decreases with the mixture coefficient. ### Interpretation The chart illustrates how different panning strategies perform under varying acoustic covariance conditions, controlled by the mixture coefficient α. The "Sensitivity" method appears to be the most stable, while the other methods exhibit more dynamic behavior. The increasing efficiencies of "Panning/Electric," "Panning/Acoustic," and "Acoustic/Electric" suggest that these methods become more effective as the acoustic environment changes. The decreasing "max xᵀKx / xᵀx" gain indicates a trade-off or limitation in maximizing this particular function as the mixture coefficient increases. The data suggests that the optimal panning strategy depends on the specific acoustic environment and the desired performance characteristics. </details> Figure 6: OPSE center content gains for $3.0$ ITU loudspeakers converge to the acoustic/electric Rayleigh quotient maximizer in diffuse-field conditions. Circular Panning Across Loudspeaker Layouts: For adaptive multichannel reproduction, it is desirable to render content channels over common loudspeaker layouts shown in Fig. 7 for any listener location and front-direction. Under OPSE, we can evaluate the panning sensitivity for all steering directions in azimuth in both anechoic $\bm{K}=\bm{1}\bm{1}^{T}$ and diffuse-field $\bm{K}=\bm{I}$ conditions. Let us constrain the acoustic power to unity $\bm{x}^{T}\bm{K}\bm{x}=1$ , relax the electrical headroom $\bm{x}≤\bm{10}$ , and vary $\bm{s}=[\cos\theta;\,\sin\theta]$ for the half-circle $0≤\theta≤\pi$ as the layouts are symmetric w.r.t. $\theta=0$ . For layouts with only frontal loudspeakers such as LRC, and wide LRC, the panning sensitivity remains high $>0.85$ for feasible steering directions. For infeasible steering directions, the VBAPS constraints are dropped in (24), and the panning sensitivity, taken to be $\bm{c}^{T}\bm{x}/\bm{x}^{T}\bm{1}$ , decrease for larger $\theta$ . The solutions are continuous w.r.t. $\theta$ for the anechoic covariance but discontinuous for the diffuse-field covariance at the feasibility boundary of $\theta$ . For triangular loudspeaker layouts (surround LRC, LRRear) containing the listener, only $2/3$ loudspeakers are active for any given $\theta$ . The solutions therefore uniquely satisfy the VBAPS constraints and are equivalent in both anechoic and diffuse-field conditions. LRRear has acceptable panning sensitivity between $\left|{\theta}\right|≤ 30^{\circ}$ , but minimal panning sensitivity near surround steering angles $100≤\theta≤ 110$ . Surround LRC has low panning sensitivity for the left and right steering angles $\theta=± 30^{\circ}$ . For the LRSLSR layout, the panning sensitivity degrades in diffuse-field conditions for frontal angles $\left|{\theta}\right|≤ 60^{\circ}$ , and is minimal in the surround loudspeaker pair’s gap $110^{\circ}≤\theta≤ 250^{\circ}$ . For the pentagon layout of uniformly spaced loudspeakers, anechoic and diffuse-field conditions have acceptable $>0.8$ and borderline $>0.7$ panning sensitivity respectively, with the latter also having lower variance. Under OPSE, the pentagon layout is therefore suited for uniform directional circular panning, LRSLSR for non-rear directional panning, and wide LRC for frontal to semi-surround directional panning for content reproduction. <details> <summary>figs/vary_layouts_mixed.png Details</summary> ![e0240f90](/v1/image/e0240f904cf66f9e5be8e2415c3a1cb6b35d8fdd2212d0e8caabc78aa45e9453) ### Visual Description ## Chart Type: Multiple Plots Comparing Loudspeaker Layouts and Panning Sensitivity ### Overview The image presents a comparative analysis of loudspeaker layouts and their corresponding panning sensitivity under varying acoustic covariance conditions. It consists of three subplots: a loudspeaker layout diagram on the left, and two line plots on the right showing panning sensitivity versus steering direction azimuth for different loudspeaker configurations under two different acoustic covariance conditions (K = 11^T and K = I). ### Components/Axes **Left Subplot: Loudspeaker Layouts** * **Title:** Loudspeaker Layouts * **Description:** A circular diagram representing the spatial arrangement of loudspeakers. * **Elements:** * A central blue dot labeled "Listener". * Angular markings around the circle: 0°, 30°, 60°, 72°, 110°, 144°, 180°, 216°, 250°, 288°, 300°, 330°. * Lines connecting the angular markings, representing different loudspeaker layouts. The color and style of each line corresponds to a specific loudspeaker configuration, as detailed in the legend of the right subplots. **Right Subplots: Panning Sensitivity vs. Steering Direction Azimuth** * **Title (Top):** Maximum Circular Panning Sensitivity for Common Loudspeaker Layouts under Varying Acoustic Covariances * **Titles (Middle):** Acoustic Covariance: K = 11^T (left plot), Acoustic Covariance: K = I (right plot) * **Y-axis:** Panning Sensitivity (scale from 0 to 1, with increments of 0.2) * **X-axis:** Steering Direction Azimuth (Degrees) (scale from 0 to 180, with markings at 0, 30, 60, 110, 180) * **Legend (Bottom):** * Blue line with circles: LRC * Orange dashed line with diamonds: Wide LRC * Yellow dotted line with downward triangles: Surround LRC * Purple dashed line with upward triangles: LRRear * Green solid line: LRSLSR * Light blue dotted line: Pentagon ### Detailed Analysis **Loudspeaker Layouts Diagram:** * **LRC (Blue, solid):** Connects 0°, 30°, and 330°. * **Wide LRC (Orange, dashed):** Connects 300°, 60°. * **Surround LRC (Yellow, dotted):** Connects 250°, 72°. * **LRRear (Purple, dashed):** Connects 180°, 330°, 30°, 180°. * **LRSLSR (Green, solid):** Connects 110°, 250°. * **Pentagon (Light Blue, dotted):** Connects 0°, 72°, 144°, 216°, 288°. **Panning Sensitivity Plots (K = 11^T):** * **LRC (Blue, circles):** Starts at approximately 0.98 at 0 degrees, decreases to approximately 0.1 at 110 degrees, then rises slightly to approximately 0.12 at 180 degrees. * **Wide LRC (Orange, diamonds):** Starts at approximately 0.98 at 0 degrees, decreases to approximately 0.85 at 30 degrees, then rises to approximately 0.92 at 60 degrees, then decreases to approximately 0.85 at 110 degrees, then rises to approximately 0.92 at 180 degrees. * **Surround LRC (Yellow, downward triangles):** Starts at approximately 0.98 at 0 degrees, decreases to approximately 0.35 at 180 degrees. * **LRRear (Purple, upward triangles):** Starts at approximately 0.2 at 0 degrees, rises to approximately 0.4 at 30 degrees, then rises to approximately 0.98 at 180 degrees. * **LRSLSR (Green, solid):** Starts at approximately 0.98 at 0 degrees, decreases to approximately 0.2 at 110 degrees, then rises to approximately 0.4 at 180 degrees. * **Pentagon (Light Blue, dotted):** Starts at approximately 0.98 at 0 degrees, decreases to approximately 0.8 at 30 degrees, then rises to approximately 0.98 at 60 degrees, then decreases to approximately 0.8 at 110 degrees, then rises to approximately 0.98 at 180 degrees. **Panning Sensitivity Plots (K = I):** * **LRC (Blue, circles):** Starts at approximately 0.98 at 0 degrees, decreases to approximately 0.1 at 110 degrees, then rises slightly to approximately 0.12 at 180 degrees. * **Wide LRC (Orange, diamonds):** Starts at approximately 0.75 at 0 degrees, rises to approximately 0.9 at 30 degrees, then decreases to approximately 0.8 at 60 degrees, then rises to approximately 0.9 at 110 degrees, then decreases to approximately 0.75 at 180 degrees. * **Surround LRC (Yellow, downward triangles):** Starts at approximately 0.98 at 0 degrees, decreases to approximately 0.35 at 180 degrees. * **LRRear (Purple, upward triangles):** Starts at approximately 0.2 at 0 degrees, rises to approximately 0.4 at 30 degrees, then rises to approximately 0.98 at 180 degrees. * **LRSLSR (Green, solid):** Starts at approximately 0.98 at 0 degrees, decreases to approximately 0.2 at 110 degrees, then rises to approximately 0.4 at 180 degrees. * **Pentagon (Light Blue, dotted):** Starts at approximately 0.98 at 0 degrees, decreases to approximately 0.75 at 30 degrees, then rises to approximately 0.85 at 60 degrees, then decreases to approximately 0.75 at 110 degrees, then rises to approximately 0.98 at 180 degrees. ### Key Observations * The loudspeaker layout diagram visually represents the spatial arrangement of each configuration. * The panning sensitivity plots show how the sensitivity changes as the steering direction azimuth varies. * The acoustic covariance significantly affects the panning sensitivity for some loudspeaker layouts. * The LRC configuration exhibits a significant drop in panning sensitivity around 110 degrees for both acoustic covariance conditions. * The LRRear configuration shows a consistent increase in panning sensitivity as the steering direction azimuth increases. * The Surround LRC configuration shows a consistent decrease in panning sensitivity as the steering direction azimuth increases. ### Interpretation The data suggests that the choice of loudspeaker layout and the acoustic environment (represented by the acoustic covariance) significantly impact the panning sensitivity. Some layouts, like LRC, exhibit a significant drop in sensitivity at certain steering angles, which could lead to inconsistencies in sound localization. Other layouts, like LRRear, show a more consistent increase in sensitivity, potentially providing a more stable sound image. The acoustic covariance also plays a crucial role, as evidenced by the differences in panning sensitivity between the K = 11^T and K = I conditions, particularly for the Wide LRC configuration. This information is valuable for audio engineers and system designers in selecting the optimal loudspeaker layout for a given acoustic environment to achieve the desired panning performance. </details> Figure 7: Maximum panning sensitivity in (24) varies across azimuth steering directions for different loudspeaker layouts, and in anechoic and diffuse-field conditions. All triangle arrangements have low-sensitivity gaps. A minimum of five uniform-spaced loudspeakers (Pentagon) achieves moderate sensitivity in diffuse-field conditions across all of azimuth. 5 Discussion While Bayesian loudspeaker normalization and OPSE formulations are acoustic measurement-free sound-field correction methods, their accuracy may be improved with indirect acoustic measurements. Loudspeaker acoustic covariances at the listening area, critical distances, and distance attenuation rates may be estimated from simplified room acoustic models such as image-sources [29] if the room dimensions are known, and from loudspeaker-to-microphone acoustic transfer functions co-located on supporting smart-loudspeaker devices. In such instances, we may couple loudspeaker normalization with OPSE by substituting the in-situ loudspeaker acoustic covariance estimates in-place of anechoic and planewave mixture. Extension to 3D layouts can be considered for some loudspeaker arrangements, but may be impractical for satisfying VBAPS constraints and ensuring feasible coverage over spherical coordinates. 6 Conclusion We presented a loudspeaker filtering method that normalizes multiple loudspeakers to a common acoustic target for a non-stationary listening location or area. Loudspeaker normalization angles w.r.t. the listener location were adapted via Bayesian posteriors over circular distribution probability density functions. We then formulated panning gain optimization problems by relaxing VBAP constraints to give a novel panning sensitivity / efficiency objective, and specified electrical, acoustical domain constraints. Augmented, primary, null-space, and relaxed forms of the OPSE problem were derived. Lastly, practical experiments quantified the OPSE solutions in applications for distributing excess content channels, evaluating efficiency in anechoic to diffuse-field conditions, and recommending loudspeaker layouts for circular panning and multichannel content reproduction. 7 Appendix Circular Distribution: We can evaluate $P(-a≤\theta≤ a)$ for $f(\theta)$ in (11) via the power-series approximation and cosine-integral [30] given by $$ \begin{split}\int_{\minus a}^{a}e^{\frac{\minus d^{2}(\theta)}{2\ell^{2}}}d\theta&=\int_{\minus a}^{a}e^{\frac{\cos(\theta)\minus 1}{\ell^{2}}}d\theta=\sum_{n=0}^{\infty}\frac{\int_{\minus a}^{a}\cos^{n}(\theta)d\theta\minus 1}{\ell^{2n}n!}.\end{split} \tag{26} $$ The product of circular distributions is a circular distribution as the sum of weighted and phase-shifted cosines in the exponents’ terms is a cosine given by $$ \begin{split}f_{i}(\theta)&\propto e^{\frac{\minus d^{2}(\theta-\mu_{i})}{2\ell_{i}^{2}}}\propto e^{\frac{\cos(\theta-\mu_{i})}{\ell_{i}^{2}}}\quad\Rightarrow\\ f_{i}(\theta)f_{j}(\theta)&\propto e^{\frac{\ell_{j}^{2}\cos(\theta-\mu_{i})+\ell_{i}^{2}\cos(\theta-\mu_{j})}{\ell_{i}^{2}\ell_{j}^{2}}}=e^{\frac{\cos(\theta-\mu)}{\ell^{2}}},\end{split} \tag{27} $$ where the mean and dispersion of the product are given by $$ \begin{split}\mu_{ij}&=\operatorname{atan2}\left({\frac{\sin(\mu_{i})}{\ell_{i}^{2}}+\frac{\sin(\mu_{j})}{\ell_{j}^{2}},\,\frac{\cos(\mu_{i})}{\ell_{i}^{2}}+\frac{\cos(\mu_{j})}{\ell_{j}^{2}}}\right),\\ \ell_{ij}^{2}&=\ell_{i}^{2}\ell_{j}^{2}\left({\ell_{i}^{4}+2\cos(\mu_{i}-\mu_{j})\ell_{i}^{2}\ell_{j}^{2}+\ell_{j}^{4}}\right)^{\minus\frac{1}{2}}.\end{split} \tag{28} $$ The posterior dispersion depends on both prior mean and dispersion unlike that of a normal distribution. Substituting the empirical mean $\mu_{i}=\bm{\bar{\theta}}^{\left\{{t}\right\}}_{n}$ , dispersion $\ell_{i}=\bar{\ell}_{n}^{\left\{{t}\right\}}$ , and the prior’s mean $\mu_{j}=\mu_{n}^{\left\{{t-1}\right\}}$ , dispersion $\ell_{j}=\ell_{n}^{\left\{{t-1}\right\}}$ in (13) gives the posterior mean $\mu_{ij}=\mu_{n}^{\left\{{t}\right\}}$ , dispersion $\ell_{ij}=\ell_{n}^{\left\{{t}\right\}}$ . Plane-wave Covariance: Let $p(\bm{r})=e^{\minus j\kappa\bm{v}^{T}\bm{r}}$ be the $2$ D plane-wave equation with incident direction $\bm{v}$ , and the region of integration be the $2$ D disc of radius $R$ where $\bm{r}=\left[{x,y}\right]^{T}$ , $-R≤ x≤ R$ , $y=\sqrt{R^{2}-x^{2}}$ . We may express the incident angle and evaluation point in polar coordinates as follows: $$ \begin{split}\bm{v}&=\left\lVert\bm{v}\right\rVert\left[{\cos\theta_{v},\,\sin\theta_{v}}\right]^{T},\quad\bm{r}=r\left[{\cos\theta,\,\sin\theta}\right]^{T},\\ \bm{v}^{T}\bm{r}&=\left\lVert\bm{v}\right\rVert r\left({\cos\theta_{v}\cos\theta+\sin\theta_{v}\sin\theta}\right)\\ &=\left\lVert\bm{v}\right\rVert r\cos\left({\theta-\theta_{v}}\right),\\ -\bm{v}^{T}\bm{r}&=\left\lVert\bm{v}\right\rVert r\cos\left({\pi-\theta+\theta_{v}}\right),\quad\textrm{ Cosine reflection}\end{split} \tag{29} $$ where $0≤ r≤ R$ and $0≤\theta≤ 2\pi$ , which removes the dependence on the plane-wave incident angle in subsequent integrals. The first moment is analytic w.r.t. the Bessel function of the first kind $J_{n}(x)$ and given by $$ \begin{split}\mathbb{E}\left[{p(\bm{r})}\right]&=\frac{1}{\pi R^{2}}\int_{0}^{2\pi}\int_{0}^{R}re^{-jk\bm{v}^{T}\bm{r}}\,dr\,d\theta\\ &=\frac{1}{\pi R^{2}}\int_{0}^{R}r\int_{0}^{2\pi}e^{jk\left\lVert\bm{v}\right\rVert r\cos\left({\pi-\theta+\theta_{v}}\right)}\,d\theta\,dr\\ &=\frac{2}{R^{2}}\int_{0}^{R}rJ_{0}(k\left\lVert\bm{v}\right\rVert r)\,dr\quad\textrm{Hansen-Bessel \cite[citep]{[\@@bibref{Number}{TEMME_1996}{}{}]}}\\ &=\frac{2J_{1}(k\left\lVert\bm{v}\right\rVert R)}{k\left\lVert\bm{v}\right\rVert R}.\quad\textrm{Bessel integral identity \cite[citep]{[\@@bibref{Number}{WEISSTEIN_BESSEL_FIRST_KIND}{}{}]}}\\ \end{split} \tag{30} $$ Let $p_{n}(\bm{r})=e^{\minus j\kappa\bm{v}_{n}^{T}\bm{r}}$ be the plane-wave equation of the $n^{th}$ loudspeaker. The second moment can be expressed as $\bm{v}=\bm{v}_{m}-\bm{v}_{n}$ , which after substitution in (30) follows $$ \begin{split}\mathbb{E}\left[{p_{m}(\bm{r})p_{n}^{*}(\bm{r})}\right]=\frac{1}{\pi R^{2}}\int_{0}^{2\pi}\int_{0}^{R}re^{\minus jk\left({\bm{v}_{m}\minus\bm{v}_{n}}\right)^{T}\bm{r}}\,dr\,d\theta\\ =\left\{\begin{array}[]{cc}\frac{2J_{1}\left({k\left\lVert\bm{v}_{m}-\bm{v}_{n}\right\rVert R}\right)}{k\left\lVert\bm{v}_{m}-\bm{v}_{n}\right\rVert R},&\left\lVert\bm{v}_{m}-\bm{v}_{n}\right\rVert>0\vskip 2.00749pt\\ 1,&\left\lVert\bm{v}_{m}-\bm{v}_{n}\right\rVert=0\end{array}\right..\end{split} \tag{31} $$ References - Recommendation [2022] Recommendation, I., “ITU-R Recommendation BS. 775-4,” Multi-channel stereophonic sound system with or without accompanying picture, 2022. - De Sena et al. [2013] De Sena, E., Hacıhabiboğlu, H., and Cvetković, Z., “Analysis and design of multichannel systems for perceptual sound field reconstruction,” IEEE Trans. Audio Speech Lang. Process., 21(8), pp. 1653–1665, 2013. - López et al. [2005] López, J. J., González, A., and Fuster, L., “Room compensation in wave field synthesis by means of multichannel inversion,” in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005., pp. 146–149, IEEE, 2005. - Brännmark et al. [2013] Brännmark, L.-J., Bahne, A., and Ahlén, A., “Compensation of loudspeaker–room responses in a robust MIMO control framework,” IEEE Trans. Audio Speech Lang. Process., 21(6), pp. 1201–1216, 2013. - Jin and Kleijn [2015] Jin, W. and Kleijn, W. B., “Theory and design of multizone soundfield reproduction using sparse methods,” IEEE/ACM Trans. Audio Speech Lang. Process., 23(12), pp. 2343–2355, 2015. - Talagala et al. [2014] Talagala, D. S., Zhang, W., and Abhayapala, T. D., “Efficient multi-channel adaptive room compensation for spatial soundfield reproduction using a modal decomposition,” IEEE/ACM Trans. Audio Speech Lang. Process., 22(10), pp. 1522–1532, 2014. - Kolundzija et al. [2011] Kolundzija, M., Faller, C., and Vetterli, M., “Reproducing sound fields using MIMO acoustic channel inversion,” Journal of the Audio Engineering Society, 59(10), pp. 721–734, 2011. - Poulose et al. [2019] Poulose, A., Eyobu, O. S., and Han, D. S., “An indoor position-estimation algorithm using smartphone IMU sensor data,” Ieee Access, 7, pp. 11165–11177, 2019. - Won et al. [2009] Won, S.-h. P., Melek, W. W., and Golnaraghi, F., “A Kalman/particle filter-based position and orientation estimation method using a position sensor/inertial measurement unit hybrid system,” IEEE Transactions on Industrial Electronics, 57(5), pp. 1787–1798, 2009. - Daniş and Cemgil [2017] Daniş, F. S. and Cemgil, A. T., “Model-based localization and tracking using bluetooth low-energy beacons,” Sensors, 17(11), p. 2484, 2017. - Wang et al. [2013] Wang, Y., Yang, X., Zhao, Y., Liu, Y., and Cuthbert, L., “Bluetooth positioning using RSSI and triangulation methods,” in 2013 IEEE 10th Consumer Communications and Networking Conference (CCNC), pp. 837–842, IEEE, 2013. - Kim et al. [2016] Kim, K.-W., Kwon, J., Lee, C.-G., and Han, J., “Accurate indoor location tracking exploiting ultrasonic reflections,” IEEE Sensors Journal, 16(24), pp. 9075–9088, 2016. - Greenfield and Hawksford [1991] Greenfield, R. and Hawksford, M. J., “Efficient filter design for loudspeaker equalization,” Journal of the Audio Engineering Society, 39(10), pp. 739–751, 1991. - Mäkivirta et al. [2018] Mäkivirta, A., Liski, J., and Välimäki, V., “Modeling and delay-equalizing loudspeaker responses,” AES: Journal of the Audio Engineering Society, 66(11), pp. 922–934, 2018. - Pulkki [1997] Pulkki, V., “Virtual sound source positioning using vector base amplitude panning,” Journal of the audio engineering society, 45(6), pp. 456–466, 1997. - Pulkki et al. [2001] Pulkki, V. et al., Spatial sound generation and perception by amplitude panning techniques, Helsinki University of Technology Ph. D. dissertation, 2001. - Pulkki [1999] Pulkki, V., “Uniform spreading of amplitude panned virtual sources,” in Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 187–190, IEEE, 1999. - Franck et al. [2017] Franck, A., Fazi, F. M., and Hamdan, E., “An optimization approach to control sound source spread with multichannel amplitude panning,” in 24th International Congress on Sound and Vibration, 2017. - Luo [2023] Luo, Y., “Active Barycentric Beamformed Stereo Upmixing,” in 2023 31st European Signal Processing Conference (EUSIPCO), pp. 316–320, 2023. - Sadek and Kyriakakis [2004] Sadek, R. and Kyriakakis, C., “A novel multichannel panning method for standard and arbitrary loudspeaker configurations,” 2004. - Batke and Keiler [2010] Batke, J.-M. and Keiler, F., “Investigation of Robust Panning Functions for 3-D Loudspeaker Setups,” in Audio Engineering Society Convention 128, Audio Engineering Society, 2010. - Oppenheim [1999] Oppenheim, A. V., Discrete-time signal processing, Pearson Education India, 1999. - Alizadeh and Goldfarb [2003] Alizadeh, F. and Goldfarb, D., “Second-order cone programming,” Mathematical programming, 95(1), pp. 3–51, 2003. - Horn and Johnson [1990] Horn, R. and Johnson, C., Matrix Analysis, Cambridge University Press, 1990, ISBN 9780521386326. - Sanathanan and Koerner [2003] Sanathanan, C. and Koerner, J., “Transfer function synthesis as a ratio of two complex polynomials,” IEEE Trans. Autom. Control, 8(1), pp. 56–58, 2003. - Lai and Lin [2010] Lai, X. and Lin, Z., “Minimax design of IIR digital filters using a sequential constrained least-squares method,” IEEE Trans. Signal Process, 58(7), pp. 3901–3906, 2010. - Toole [2006] Toole, F. E., “Loudspeakers and rooms for sound reproduction—A scientific review,” Journal of the Audio Engineering Society, 54(6), pp. 451–476, 2006. - Luo [2021] Luo, Y., “Spherical harmonic covariance and magnitude function encodings for beamformer design,” EURASIP Journal on Audio, Speech, and Music Processing, 2021, pp. 1–17, 2021. - Lehmann and Johansson [2008] Lehmann, E. A. and Johansson, A. M., “Prediction of energy decay in room impulse responses simulated with an image-source model,” The Journal of the Acoustical Society of America, 124(1), pp. 269–277, 2008. - Weisstein [2025] Weisstein, E. W., “Cosine Integral,” https://mathworld.wolfram.com/CosineIntegral.html, 2025, From MathWorld–A Wolfram Web Resource. - Temme [1996] Temme, N. M., Special Functions: An Introduction to the Classical Functions of Mathematical Physics, pp. 228–231, Wiley-Interscience, 1996. - Weisstein [2023] Weisstein, E. W., “Bessel Function of the First Kind,” https://mathworld.wolfram.com/BesselFunctionoftheFirstKind.html, 2023, From MathWorld–A Wolfram Web Resource.

Rendering Paper...