2510.23937v1

Model: gemma-3-27b-it-free

# Optimized Loudspeaker Panning for Adaptive Sound-Field Correction and Non-stationary Listening Areas **Authors**: Yuancheng Luo 65 Yuancheng Luoluoyuancheng@gmail.com Luo Surround sound systems commonly distribute loudspeakers along standardized layouts for multichannel audio reproduction. However in less controlled environments, practical layouts vary in loudspeaker quantity, placement, and listening locations / areas. Deviations from standard layouts introduce sound-field errors that degrade acoustic timbre, imaging, and clarity of audio content reproduction. This work introduces both Bayesian loudspeaker normalization and content panning optimization methods for sound-field correction. Conjugate prior distributions over loudspeaker-listener directions update estimated layouts for non-stationary listening locations; digital filters adapt loudspeaker acoustic responses to a common reference target at the estimated listening area without acoustic measurements. Frequency-domain panning coefficients are then optimized via sensitivity / efficiency objectives subject to spatial, electrical, and acoustic domain constraints; normalized and panned loudspeakers form virtual loudspeakers in standardized layouts for accurate multichannel reproduction. Experiments investigate robustness of Bayesian adaptation, and panning optimizations in practical applications. ## 1 Introduction Surround sound systems for multichannel audio reproduction have risen in popularity in home theater setups that accommodate proper loudspeaker selection, layout, acoustic room treatment, and calibration established by the international telecommunication union (ITU) standards [1]. Conversely, the same accommodations present a barrier to entry for extemporary arrangements where loudspeakers differ in quality and placement, and operate in changeable listening locations / areas, and reverberant environments. Deviating from the standards degrade accurate reproduction of multichannel audio content as intended by the content authors. Therefore, methods from sound-field control and reconstruction correct for the effects of irregular loudspeaker placements and room reverberation in the listening area via acoustic measurement system inversion [2, 3, 4], and modal / planewave decomposition [5, 6, 7]; such methods however are inapplicable when acoustic measurements remain unavailable. In the absence of acoustic measurements, other sensing modalities can infer the loudspeaker layout and listening area location. Inertial measurement unit [8, 9] and bluetooth low energy [10, 11] indoor tracking can estimate changes in loudspeaker position and orientation. Ultrasound [12], camera, and video can track in-room listener and loudspeaker positions within fields-of-view. Such meta-data yields a 2D layout of the estimated loudspeaker placements, listening location, and a front direction. We therefore reproduce multichannel content at the listener’s area by incorporating Bayesian uncertainty of the estimated layout inputs with loudspeaker distance and orientation normalization [13, 14] to the listener, and then reformulate conventional amplitude panning methods [15, 16, 17] in terms of constrained optimization along joint spatial [18, 19], electrical [20], and acoustical [21] domains. The paper is organized as follows: Section 2 introduces our normalization method for aligning loudspeaker acoustic transfer functions in an arbitrary layout to a common axial-reference target at the listener location; acoustic delay and attenuation compensate for varying loudspeaker-listener distances whereas minimum-phase and all-pass factorizations [22] normalize for loudspeaker orientations relative to listener locations. We integrate estimates of the loudspeaker-listener normalization directions via Bayesian posterior updates of a novel circular distribution conjugate prior, and provide a sample calibration for a sequence of normalization angles. Section 3 presents our novel normalized loudspeaker panning optimization, which solves for frequency-dependent magnitude-gains that satisfy spatial vector-bases, electrical headroom, and acoustic power constraints; we augment the former vector-base amplitude panning with slack (VBAPS) to accommodate constraints in electric and acoustic domains. Next, we derive a panning sensitivity / efficiency objective from the augmented form that measures panned-source discreteness, and give equivalent primary and null-space formulations in fewer variables. Planewave acoustic covariances model anechoic to diffuse-field assumptions for variable sized listening areas. Optimal solutions are found via second-order cone program [23]. Section 4 applies our model to several practical applications of loudspeaker correction under varying constraints. For high loudness targets, we find optimal gains across loudspeakers for overdriven content that maximize source discreteness. For anechoic to diffuse-field environments, we show that our panning optimization solutions converge from discrete panning to Rayleigh quotient maximizers [24]. For circular-panning over varying loudspeaker layouts, we evaluate panning sensitivity across azimuth steering-angles and recommend preferred layouts for different number of loudspeakers. Section 5 discusses results and future work. ## 2 Loudspeaker Normalization Let $S(\nu,\theta)$ be the loudspeaker’s electrical-acoustical transfer function at frequency $\nu$ measured at $1$ meter distance along azimuth $\theta$ (radians) in the horizontal plane, with the acoustic path-delay removed. Under far-field assumptions, the loudspeaker frequency response attenuates by the inverse-distance and undergoes pure-delay. It is useful to express the far-field transfer function along a listener-centric coordinate frame, which centers the origin at the listener’s location and aligns the $+x$ axis with the listener’s facing direction. The acoustic transfer function $H_{n}(\kappa,\bm{r})$ at coordinate $\bm{r}\in\mathbb{R}^{2\times 1}$ for the $n^{th}$ loudspeaker located at coordinate $\bm{u}_{n}\in\mathbb{R}^{2\times 1}$ with the orientation unit-vector $\bm{o}_{n}\in\mathbb{R}^{2\times 1}$ follows $$ \begin{split}H_{n}(\nu,\bm{r})&=S\left({\nu,\theta_{n}(\bm{r})}\right)\frac{e^{\minus j\kappa\left\lVert\bm{s}_{n}(\bm{r})\right\rVert}}{\left\lVert\bm{s}_{n}(\bm{r})\right\rVert},\quad\kappa=\frac{2\pi\nu}{c},\\ \theta_{n}(\bm{r})&=\cos^{-1}\left({\frac{\bm{o}_{n}^{T}\bm{s}_{n}(\bm{r})}{\left\lVert\bm{s}_{n}(\bm{r})\right\rVert}}\right),\quad\bm{s}_{n}(\bm{r})=\bm{r}-\bm{u}_{n},\end{split} \tag{1} $$ where $\kappa$ is the angular wavenumber, $c$ is the speed of sound in meters/second, $\bm{s}_{n}(\bm{r})$ is the evaluation direction relative to the loudspeaker’s location, and $\theta_{n}(\bm{r})$ is the evaluation angle relative to the loudspeaker’s orientation. We can normalize the loudspeaker’s transfer function to approximate the original loudspeaker’s response $S(\nu,\theta)$ within a listening window at the listener’s location $\bm{r}=\bm{0}$ . Consider the following decomposition of the loudspeaker transfer function $S(\nu,\theta)=S_{E}(\nu)S_{A}(\nu,\theta)$ into acoustical and electrical domain transfer functions $S_{A}(\nu,\theta)$ and $S_{E}(\nu)$ respectively. A filter with frequency response $G_{n}(\nu)$ that normalizes (1) to the loudspeaker’s on-axis response $H_{n}(\nu,\bm{0})G_{n}(\nu)=S(\nu,0)$ is given by $$ \begin{split}G_{n}(\nu)=Q_{n}(\nu)\left\lVert\bm{u}_{n}\right\rVert e^{j\kappa\left\lVert\bm{u}_{n}\right\rVert},\quad Q_{n}(\nu)=\frac{S_{A}(\nu,0)}{S_{A}\left({\nu,\bm{\bar{\theta}}_{n}}\right)},\end{split} \tag{2} $$ where $\bm{\bar{\theta}}_{n}=\theta_{n}(\bm{0})$ is the normalization angle between the loudspeaker’s orientation and the listener. The electrical domain term $S_{E}(\nu)$ cancels within the quotient $Q_{n}(\nu)$ in (2), thereby negating prior signal processing in loudspeaker playback. $Q_{n}(\nu)$ is therefore the acoustic relative-transfer-function between loudspeaker’s axial and listener-direction acoustic responses. Moreover, if $S_{A}(\nu,\theta)$ share a common acoustic delay and the remainder is minimum-phase for bounded $\theta$ that define a listening window, then $Q_{n}(\nu)$ must also be minimum-phase. Thus, the normalized transfer function $G_{n}(\nu)$ compensates for both loudspeakers’ orientation and distance relative to the listener as shown in Fig. 1. <details> <summary>figs/pretransform.png Details</summary> ![cc3e9781](/v1/image/cc3e9781ddff15f117c22f51946fea8b9b05400a69f1dc25d95e24c7b7346d8e) ### Visual Description \n ## Diagram: Sound Source and Receiver Geometry ### Overview The image is a diagram illustrating the geometry of a sound source and a receiver within a spherical coordinate system. It depicts a sound source emitting waves towards a receiver (represented by a head silhouette), with annotations defining vectors, distances, and a spherical boundary. The diagram appears to be related to acoustic modeling or sound field analysis. ### Components/Axes The diagram includes the following components: * **Sound Source:** Represented by a speaker icon on the left side of the diagram. * **Receiver:** Represented by a head silhouette on the right side of the diagram. * **Spherical Boundary:** A dashed circle surrounding the sound source and receiver, labeled "1 meter". * **Vectors:** * `u_n`: A green vector pointing from the sound source towards the center of the sphere. * `o_n`: A light blue vector pointing from the sound source towards the receiver. * `v_n`: A red vector, representing the normalized `u_n` vector. * **Distance:** `r`, representing the distance from the sound source to the receiver. * **Angle:** `θ_n(r)`, representing the angle between `o_n` and `u_n`. * **Equations:** * `s_n(r) = r - u_n` * `S_A(v, θ = 0)` * `S_A(v, θ_n(r))` * **Point:** `r = 0` at the receiver location. ### Detailed Analysis / Content Details The diagram defines a coordinate system centered on the sound source. The spherical boundary has a radius of approximately 1 meter. * **Vector `u_n`:** Points radially outward from the sound source. Its length is not explicitly defined, but it appears to be a unit vector based on the normalization to `v_n`. * **Vector `v_n`:** Is defined as `u_n` normalized by its magnitude: `v_n = u_n / ||u_n||`. This implies `v_n` is a unit vector in the same direction as `u_n`. * **Vector `o_n`:** Points from the sound source to the receiver. * **Angle `θ_n(r)`:** Is the angle between the vectors `o_n` and `u_n`. It is a function of the distance `r`. * **Equation `s_n(r) = r - u_n`:** Represents a vector difference between the distance `r` and the vector `u_n`. * **Equations `S_A(v, θ = 0)` and `S_A(v, θ_n(r))`:** These equations likely represent some acoustic property (possibly sound pressure or intensity) as a function of velocity `v` and angle `θ`. The first equation is evaluated at `θ = 0`, and the second at `θ_n(r)`. * **Point `r = 0`:** Indicates that the receiver is located at the origin of the coordinate system, relative to the sound source. ### Key Observations The diagram focuses on the relationship between the sound source, receiver, and the direction of sound propagation. The use of spherical coordinates suggests an analysis of sound fields in a three-dimensional space. The equations `S_A` suggest a model for calculating some acoustic property based on the angle of incidence. ### Interpretation This diagram likely represents a simplified model for analyzing sound propagation from a source to a receiver. The equations and vectors are used to define the geometry and potentially calculate the sound pressure or intensity at the receiver. The normalization of `u_n` to `v_n` suggests a focus on the direction of sound propagation rather than its magnitude. The equations `S_A` are likely part of a larger model for predicting sound fields, potentially used in applications such as room acoustics, noise control, or audio engineering. The diagram is a conceptual representation and does not provide specific numerical data, but rather defines the relationships between the variables involved in the acoustic model. The use of `r=0` at the receiver suggests a coordinate transformation where the receiver is the origin. The diagram is a foundational element for understanding the mathematical framework used to model sound propagation. </details> Figure 1: Acoustic transfer function $G_{n}(\nu)$ in (2) normalizes the direct acoustic path between the listener and loudspeaker at $\bm{u}_{n}$ to be its on-axis response $S_{A}(\nu,0)$ at the normalized coordinate $\bm{v}_{n}$ . In practice, we can find the rational function approximation [25, 26] to $Q_{n}(\nu)$ , expressed in terms of minimum-phase $\mathbb{M}_{n}(\nu)$ and all-pass $\mathbb{A}_{n}(\nu)$ transfer functions given by $$ \begin{split}Q_{n}(\nu)\approx\mathbb{M}_{n}(\nu)\mathbb{A}_{n}(\nu),\quad\mathbb{A}_{n}(\nu)=\bar{\mathbb{A}}_{n}(\nu)\ddot{\mathbb{A}}_{n}(\nu),\end{split} \tag{3} $$ where $\bar{\mathbb{A}}_{n}(\nu)$ and $\ddot{\mathbb{A}}_{n}(\nu)$ are all-pass transfer functions belonging to stable and unstable components respectively. The unstable all-pass $\ddot{\mathbb{A}}_{n}(\nu)$ contains the reciprocal poles and zeros of the Padé approximant outside the complex unit-circle, and is ideally empty or low-order for $\theta$ in the listening window. We can realize a causal-stable filter-response $G_{n}(\nu)$ for an all-passed loudspeaker transfer function in (2) as follows: $$ \begin{split}H_{n}(\nu,\bm{0})G_{n}(\nu)=S(\nu,0)\frac{e^{\minus j\kappa d}}{\ddot{\mathbb{A}}_{lcm}(\nu)}\quad\Rightarrow\quad\\ G_{n}(\nu)=\mathbb{M}_{n}(\nu)\bar{\mathbb{A}}_{n}(\nu)\frac{\ddot{\mathbb{A}}_{n}(\nu)}{\ddot{\mathbb{A}}_{lcm}(\nu)}\left\lVert\bm{u}_{n}\right\rVert e^{j\kappa\left\lVert\bm{u}_{n}\minus d\right\rVert},\\ \end{split} \tag{4} $$ where $d=\max_{1\leq n\leq N}\left\lVert\bm{u}_{n}\right\rVert$ is the furthest loudspeaker distance, and $\ddot{\mathbb{A}}_{lcm}(\nu)$ is the transfer function of the set of least common multiple (LCM) reciprocal poles and zeros across the unstable all-passes $\left\{{\ddot{\mathbb{A}}_{1}(\nu),\ldots,\ddot{\mathbb{A}}_{N}(\nu)}\right\}$ . In the $z$ -domain, we can therefore express the all-pass and LCM transfer functions as follows: $$ \begin{split}\ddot{\mathbb{A}}_{n}(z)&=\prod_{p\in P_{n}}\left({\frac{1-p^{*}z}{1-pz^{\minus 1}}}\right)^{k_{pn}},\quad P_{n}=\left\{{p_{1n},\ldots,p_{M_{n}n}}\right\},\\ \ddot{\mathbb{A}}_{lcm}(z)&=\prod_{p\in P}\left({\frac{1-p^{*}z}{1-pz^{\minus 1}}}\right)^{\max\limits_{1\leq n\leq N}k_{pn}},\quad P=\cup_{n=1}^{N}P_{n},\end{split} \tag{5} $$ where $p^{*}$ is the conjugate transpose, and $P_{n}$ is the set of unique poles and $k_{pn}$ is the multiplicity of pole $p$ for the $n^{th}$ loudspeaker. By taking the maximum multiplicity for each unique and unstable pole across all $\ddot{\mathbb{A}}_{n}(z)$ , and dividing by the subsequent LCM $\ddot{\mathbb{A}}_{lcm}(z)$ , the unstable poles in $\ddot{\mathbb{A}}_{n}(z)$ cancel and the remaining all-pass adds minimal additional group-delay in $G_{n}(\nu)$ . The filtered loudspeakers’ direct paths are thus matched with a common all-passed on-axis response. Lastly, we gain the loudspeaker filter $G_{n}(\nu)$ to match the expected acoustic power at a common distance $D$ , such as the median of all loudspeakers-to-listener distances, via the following room acoustic attenuation model: Let us consider the inverse-distance law $\rho_{DP}(r)=\bar{\rho}r^{\minus 2}$ for the attenuation of the direct acoustic path response’s nominal power $\bar{\rho}$ at distance $r$ from a loudspeaker. In a room environment, let $\rho_{IP}(r)$ be the total power of indirect acoustic paths at distance $r$ . We can model the ratio of the direct-to-indirect acoustic path’s power at $r$ and total power as follows: $$ \begin{split}\frac{\rho_{DP}(r)}{\rho_{IP}(r)}&=\left({\frac{d_{c}}{r}}\right)^{2\beta},\quad\beta=10^{\frac{\gamma\textrm{ dB/dd}}{10}},\quad\textrm{Attenuation rate}\\ \rho(r)&=\rho_{DP}(r)+\rho_{IP}(r)=\bar{\rho}r^{\minus 2}\left({1+\left({\frac{d_{c}}{r}}\right)^{2\beta}}\right),\end{split} \tag{6} $$ where $d_{c}$ is the so-called critical distance (meters) where the direct and indirect acoustic powers are equivalent, and $\beta$ a decay-rate parameterized by $\gamma$ decibels (dB) per double-distance (dd); typical $\gamma\in\left\{{0,-3}\right\}$ and $0.5\leq d_{c}\leq 1.5$ span idealized concert-hall to small-room spaces [27]. Normalizing the power at distance $r$ to $D$ therefore follows $$ \begin{split}F(r,\,D,\,d_{c})=\sqrt{\frac{\rho(D)}{\rho(r)}}=\frac{r}{D}\sqrt{\frac{d_{c}^{2\beta}+D^{2\beta}}{d_{c}^{2\beta}+r^{2\beta}}},\end{split} \tag{7} $$ whereby substituting $\left\lVert\bm{u}_{n}\right\rVert$ with $F(\left\lVert\bm{u}_{n}\right\rVert,\,D,\,d_{c})$ in (4) compensates for loudspeaker distances to the listener in a room. Model Uncertainty for Non-stationary Targets: In instances where the listener’s location changes over time or require online estimation, we normalize the loudspeaker via the mean listener distance $\frac{1}{T}\int_{0}^{T}\left\lVert\bm{u}_{n}(t)\right\rVert dt$ , and treat the normalization angle $\bm{\bar{\theta}}_{n}$ relative to the loudspeaker orientation $\bm{o}_{n}$ in (2) as a random variable. The target transfer function $G_{n}(\nu)$ and quotient term $Q_{n}(\nu)$ are re-defined to minimize the expected squared-differences between the anechoic responses $S_{A}(\nu,\theta)$ sampled over axial-centered and loudspeaker-listener centered circular probability distribution functions (PDFs) $f_{0}(\theta)$ and $f_{n}(\theta)$ , $\forall 1\leq n\leq N$ respectively; circular PDFs satisfy $f(\theta)=f(\theta+2\pi k)$ , $\forall k\in\mathbb{Z}$ . We present two acoustic averages: $$ \begin{split}\bar{S}_{A}(\nu,f(\theta))&=\mathbb{E}\left[{S_{A}(\nu,\theta)}\right]=\int S_{A}(\nu,\theta)f(\theta)d\theta,\\ \hat{S}_{A}(\nu,f(\theta))&=\mathbb{E}\left[{\left|{S_{A}(\nu,\theta)}\right|^{2}}\right]=\int\left|{S_{A}(\nu,\theta)}\right|^{2}f(\theta)d\theta,\end{split} \tag{8} $$ where $\bar{S}_{A}(\nu,f(\theta))$ and $\hat{S}_{A}(\nu,f(\theta))$ are spatial windowed averages of the acoustic response and power respectively; axial window response average $\bar{S}_{A}(\nu,f_{0}(\theta))$ and power average $\hat{S}_{A}(\nu,f_{0}(\theta))$ sample from the $f_{0}(\theta)$ distribution. The modified quotient term $Q_{n}(\nu)$ in (2) is replaced with the weighted least-squares minimizer of $\operatorname*{arg\,min}_{X}\int\left|{S_{A}(\nu,\theta)X-\bar{S}_{A}(\nu,f_{0}(\theta))}\right|^{2}f_{n}(\theta)d\theta$ given by $$ \begin{split}\bar{Q}_{n}(\nu)=\bar{S}_{A}(\nu,f_{0}(\theta))\frac{\bar{S}_{A}^{*}(\nu,f_{n}(\theta))}{\hat{S}_{A}(\nu,f_{n}(\theta))},\end{split} \tag{9} $$ where $\bar{S}_{A}^{*}(\nu,f_{n}(\theta))$ is the conjugate transpose, and $\bar{Q}_{n}(\nu)$ accounts for both amplitude and phase differences in the averaged responses. The analogous quotient for the spatial windowed acoustic power average follows $$ \begin{split}\hat{Q}_{n}(\nu)&=\sqrt{\frac{\hat{S}_{A}(\nu,f_{0}(\theta))}{\hat{S}_{A}(\nu,f_{n}(\theta))}},\end{split} \tag{10} $$ where $\hat{Q}_{n}(\nu)$ has zero-phase and therefore compensates for only the amplitude. Both quotients can be efficiently evaluated if $f_{0}(\theta)$ , $f_{n}(\theta)$ are both uni-modal and smooth over azimuth, have expansions along a common orthogonal basis with $S_{A}(\nu,\theta)$ , and follow the contours of a listening window. Let us consider the circular distribution $f(\theta)$ defined by the squared-exponential of the chordal distance $d(\theta)$ on a unit-disk, which along with $S_{A}(\nu)$ has a series-expansion over the Legendre polynomials [28], and normalized over the domain of all azimuth angles $-\pi\leq\theta\leq\pi$ : $$ \begin{split}f(\theta)&=\frac{e^{\frac{\minus d^{2}(\theta\minus\mu)}{2\ell^{2}}}}{2\pi e^{\minus\ell^{\minus 2}}J_{0}(j\ell^{\minus 2})},\quad d(\theta)=2\sin\left({\frac{{\theta}\textrm{ mod }{2\pi}}{2}}\right),\end{split} \tag{11} $$ where $J_{0}$ is the Bessel function of the first kind, $\mu$ is the mean azimuth, and $\ell$ is the dispersion. The function is symmetric w.r.t. its maximum $f(\mu)$ and minimum $f(\mu\pm\pi)$ , infinitely differentiable in all azimuths, and its percentiles computable via series expansion in appendix (26). Large dispersion $\ell$ gives a uniform distribution as $\lim_{\ell\rightarrow\infty}f(\theta)=(2\pi)^{\minus 1}$ ; small dispersion gives the dirac distribution as $\lim_{\ell\rightarrow 0}f(\theta-\mu)=\delta$ . We can bound the dispersion via design parameters characterizing a listening window’s peak such as the full-width at half-maximum (FWHM) measure: $$ \begin{split}\frac{f(\mu)}{2}&=f\left({\mu\pm\frac{\textrm{FWHM}}{2}}\right),\quad 0\leq\textrm{FWMH}\leq 2\pi,\\ \ell&=\frac{2\sin\left({\frac{\textrm{FWMH}}{4}}\right)}{\sqrt{2\ln(2)}}\,\,\,\Rightarrow\,\,\,0\leq\ell\leq\sqrt{2/\ln(2)},\end{split} \tag{2} $$ which defines the angular width where $f(\theta)$ spans half its maximum amplitude as shown in Fig. 2. At the upper-limit FWHM $360^{\circ}$ , $f(\theta)$ contains $\left\{{60.9,\,33.2,\,22.5}\right\}\$ of its mass within the frontal intervals $\left|{\theta-\mu}\right|\leq\left\{{90,\,45,\,30}\right\}^{\circ}$ respectively. For tighter FWHM $\leq 90.22^{\circ}$ bounds, $f(\theta)$ contains the $95\$ confidence interval in the half-space $\left|{\theta-\mu}\right|\leq 90^{\circ}$ of its mean azimuth $\mu$ . For the axial-centered PDF in (8), we set the window’s FWHM to $60^{\circ}$ where $f_{0}(\theta)=f(\theta\,|\,\mu=0,\ell=0.4396)$ . We now proceed with online adaptation of the normalization angles $\bm{\bar{\theta}}_{n}$ over time. <details> <summary>figs/circular_distribution.png Details</summary> ![895ba18f](/v1/image/895ba18f521ca2b30f223008bc2e6f996a8c0176d5d50cd506f06a433ec2a7e5) ### Visual Description \n ## Chart: Circular Distribution Function ### Overview The image presents a chart illustrating a circular distribution function, specifically a Probability Density Function (PDF). The chart displays the function's behavior across a range of angles (θ) in degrees, from -180 to 180. Several percentile markers and the Full Width at Half Maximum (FWHM) are indicated on the plot. ### Components/Axes * **Title:** "Circular Distribution Function" * **X-axis:** θ (Degrees), ranging from -180 to 180. The axis is labeled "θ (Degrees)". * **Y-axis:** f(θ) PDF, ranging from 0 to 0.6. The axis is labeled "f(θ) PDF". * **Curve:** A black solid line representing the f(θ) PDF. * **Vertical Lines:** Multiple vertical dashed lines indicating percentile values and FWHM boundaries. * **Legend:** A single entry: "f(θ) PDF" associated with the black curve. * **Percentile Markers:** Labels indicating percentile values (2.5%, 5%, 13.9%, 25%, 75%, 86.1%, 95%, 97.5%). * **FWHM Marker:** Label indicating "FWHM". * **Half-maximum Marker:** Label indicating "Half-maximum". * **FWHM/2 Markers:** Labels indicating "-FWHM/2" and "+FWHM/2". ### Detailed Analysis The black curve representing f(θ) PDF is symmetrical around θ = 0. It rises from 0 at θ = -180, reaches a maximum value of approximately 0.6 at θ = 0, and then decreases back to 0 at θ = 180. * **2.5% Percentile:** Located at approximately θ = -95 degrees. * **5% Percentile:** Located at approximately θ = -85 degrees. * **13.9% Percentile:** Located at approximately θ = -65 degrees. * **25% Percentile:** Located at approximately θ = -45 degrees. * **75% Percentile:** Located at approximately θ = 45 degrees. * **86.1% Percentile:** Located at approximately θ = 65 degrees. * **95% Percentile:** Located at approximately θ = 85 degrees. * **97.5% Percentile:** Located at approximately θ = 95 degrees. * **FWHM:** The FWHM spans from approximately θ = -45 degrees to θ = 45 degrees. * **-FWHM/2:** Located at approximately θ = -45 degrees. * **+FWHM/2:** Located at approximately θ = 45 degrees. * **Half-maximum:** The half-maximum value on the y-axis is approximately 0.3. The curve appears to be a Gaussian-like distribution, but adapted for circular data. ### Key Observations * The distribution is symmetrical around θ = 0. * The FWHM provides a measure of the spread of the distribution. * The percentile markers indicate the range of angles corresponding to specific probabilities. * The maximum value of the PDF is approximately 0.6. ### Interpretation This chart demonstrates a circular probability distribution. The function f(θ) PDF describes the likelihood of observing a particular angle θ. The symmetry suggests that angles are equally likely to occur on either side of the central tendency (θ = 0). The FWHM and percentile markers provide quantitative measures of the distribution's spread and shape. This type of distribution is commonly used in fields like directional statistics, where data is inherently circular (e.g., wind direction, animal orientation). The chart effectively visualizes how probabilities are distributed around a circle, offering insights into the concentration or dispersion of directional data. The use of percentiles allows for easy interpretation of the probability associated with specific angular ranges. </details> Figure 2: Circular distribution prior (FWHM $90.2^{\circ}$ ) contains $\left\{{50,90,95}\right\}\$ of normalization angles within $\left|{\theta}\right|\leq\left\{{27.4,\,72,\,90}\right\}^{\circ}$ of the mean angle. Suppose we have measured a normalization angle $\bar{\theta}$ belonging to the $n^{th}$ loudspeaker with known measurement dispersion $\bar{\ell}$ such that the likelihood function $f(\bar{\theta}\,|\,\mu=\bm{\bar{\theta}}_{n},\ell=\bar{\ell})$ follows the squared-exponential chordal function in (11). Let the unknown normalization angle $\bm{\bar{\theta}}_{n}$ of the $n^{th}$ loudspeaker have a squared-exponential chordal prior-distribution $f(\bm{\bar{\theta}}_{n}|\,\mu=\mu_{n},\ell=\ell_{n})$ with initial hyperparameters $\mu_{n}=0$ mean azimuth and $\ell=\sqrt{2/\ln(2)}$ maximum dispersion. The posterior normalization angle therefore has a conjugate distribution with hyperparameters following appendix (27). Over multiple time-steps $t$ , the likelihood, prior, and posterior functions across measured angles $\bm{\bar{\theta}}^{\left\{{t}\right\}}_{n}$ with dispersion $\bar{\ell}_{n}^{\left\{{t}\right\}}$ are given by $$ \begin{split}L\left({\bm{\bar{\theta}}_{n}\,|\,\bm{\bar{\theta}}^{\left\{{t}\right\}}_{n}}\right)&=f\left({\bm{\bar{\theta}}^{\left\{{t}\right\}}_{n}\,|\,\mu=\bm{\bar{\theta}}_{n},\ell=\bar{\ell}_{n}^{\left\{{t}\right\}}}\right),\quad\textrm{Likelihood}\\ P(\bm{\bar{\theta}}_{n})&=f\left({\bm{\bar{\theta}}_{n}\,|\,\mu=\mu_{n}^{\left\{{t\minus 1}\right\}},\,\ell=\ell_{n}^{\left\{{t\minus 1}\right\}}}\right),\quad\textrm{Prior}\\ P\left({\bm{\bar{\theta}}_{n}\,|\,\bm{\bar{\theta}}^{\left\{{t}\right\}}_{n}}\right)&\propto L\left({\bm{\bar{\theta}}_{n}\,|\,\bm{\bar{\theta}}^{\left\{{t}\right\}}_{n}}\right)P(\bm{\bar{\theta}}_{n}),\qquad\textrm{Posterior}\end{split} \tag{13} $$ where the reported normalization angle $\bm{\bar{\theta}}^{\left\{{t}\right\}}_{n}$ is a point-estimate taken within a measurement session, and the dispersion $\bar{\ell}_{n}^{\left\{{t}\right\}}$ is proportional to the point-estimate’s confidence interval. Both quantities can vary over time as the listener’s location may change between sessions (e.g. different seating), and measured under different noise conditions. The initial hyperparameters for mean $\mu_{n}^{\left\{{0}\right\}}=0$ and dispersion $\ell_{n}^{\left\{{0}\right\}}=0.6515$ (FWHM $90.22^{\circ}$ ) are informative as loudspeakers generally orient towards the intended listening area. The posterior estimate of $\bm{\bar{\theta}}_{n}$ follows Bayes’ theorem, where the current mean $\mu_{n}^{\left\{{t}\right\}}$ and dispersion $\ell_{n}^{\left\{{t}\right\}}$ hyperparameters are updated from the measurement terms $\bm{\bar{\theta}}^{\left\{{t}\right\}}_{n},\bar{\ell}_{n}^{\left\{{t}\right\}}$ in the likelihood function and the previous hyperparameters $\mu_{n}^{\left\{{t-1}\right\}},\ell_{n}^{\left\{{t-1}\right\}}$ via appendix (28). Lastly, the normalization filter’s quotient terms (9), (10) are updated for PDF $f_{n}(\theta)=f(\theta\,|\,\mu=\mu_{n}^{\left\{{t}\right\}},\ell=\ell_{n}^{\left\{{t}\right\}})$ , and the filters $G_{n}(\nu)$ are re-computed. Let us step-through the following example: <details> <summary>figs/fst_window_kernel_sample_transfer_functions.png Details</summary> ![5b21ab6c](/v1/image/5b21ab6c603dac6c711e035540880db881da2904440ac29d4be6bc61a286e680) ### Visual Description \n ## Heatmap: Loudspeaker Acoustic Transfer Function ### Overview The image presents a 2D heatmap visualizing the Loudspeaker Acoustic Transfer Function SA(ν, θ). The heatmap displays the magnitude in decibels (dB) as a function of frequency (ν) and azimuth angle (θ). The color gradient represents the magnitude, ranging from 0 dB (red) to -20 dB (blue). ### Components/Axes * **Title:** Loudspeaker Acoustic Transfer Function SA(ν, θ) - positioned at the top-center. * **X-axis:** Azimuth θ (Degrees) - ranging from -180 to 180 degrees, with tick marks at -180, -135, -90, -45, 0, 45, 90, 135, and 180 degrees. * **Y-axis:** Frequency (Hz) - displayed on a logarithmic scale, with tick marks at 102, 103, and 104 Hz. * **Colorbar:** Located on the right side of the heatmap, representing Magnitude (dB). The colorbar ranges from 0 dB (red) to -20 dB (blue), with intermediate values indicated by color gradients. ### Detailed Analysis The heatmap shows a complex pattern of acoustic transfer function magnitude across frequency and azimuth. * **High Frequencies (around 104 Hz):** A narrow, intense peak is visible at 0 degrees azimuth. The magnitude is approximately 0 dB (red). The magnitude decreases rapidly as the azimuth angle moves away from 0 degrees in either direction. * **Mid Frequencies (around 103 Hz):** Two peaks are visible, one around -45 degrees and another around 45 degrees. The magnitude at these peaks is approximately -5 dB (yellow-orange). There is a dip in magnitude around 0 degrees, with values around -10 dB (green). * **Low Frequencies (around 102 Hz):** The magnitude is generally low across all azimuth angles. The magnitude ranges from approximately -15 dB (light green) to -20 dB (blue). There is a slight increase in magnitude towards the -90 and 90 degree azimuths, reaching approximately -10 dB (green). * **Azimuth -180 to 180 degrees:** The magnitude generally decreases as the azimuth angle moves away from 0 degrees at higher frequencies. At lower frequencies, the magnitude is relatively consistent across all azimuth angles. * **Frequency 102 to 104 Hz:** The magnitude generally increases with frequency, particularly around 0 degrees azimuth. ### Key Observations * The loudspeaker exhibits a highly directional response at high frequencies, with maximum output at 0 degrees azimuth. * The acoustic transfer function is more omnidirectional at low frequencies. * There are noticeable nulls (areas of low magnitude) in the acoustic transfer function at certain frequencies and azimuth angles. * The response is approximately symmetrical around the 0-degree azimuth. ### Interpretation The heatmap demonstrates the frequency-dependent directivity of the loudspeaker. At higher frequencies, the sound is focused in a narrow beam, while at lower frequencies, the sound radiates more broadly. This behavior is typical of many loudspeakers, where the wavelength of the sound is comparable to the size of the speaker diaphragm. The nulls in the acoustic transfer function likely correspond to destructive interference patterns caused by the speaker's geometry and the surrounding environment. The data suggests that the loudspeaker is optimized for projecting sound directly forward (0 degrees azimuth) at high frequencies. The logarithmic scale on the y-axis emphasizes the relative changes in magnitude across different frequencies, highlighting the significant drop in magnitude at lower frequencies. The colorbar provides a clear visual representation of the magnitude scale, allowing for easy interpretation of the heatmap data. </details> <details> <summary>figs/fst_window_kernel_bayes_circular_dist.png Details</summary> ![511ca76b](/v1/image/511ca76bbc4d5db0f509381e98f0db268dd0363026e84ab8b01f497c9823bd8d) ### Visual Description ## Chart: Normalization Angle Circular Distributions across Time-steps ### Overview The image presents a line chart illustrating the evolution of probability distributions (circular distributions) of an angle (Azimuth θ) over four time steps (t = 0, 1, 2, 3). The chart compares a "Ground Truth" distribution with prior and posterior distributions calculated at each time step, as well as likelihood distributions at t=1, t=2, and t=3. The y-axis represents the probability density function f(θ). ### Components/Axes * **Title:** Normalization Angle Circular Distributions across Time-steps * **X-axis:** Azimuth θ (Degrees), ranging from -180 to 180. Markers are present at -180, -135, -90, -45, 0, 45, 90, 135, and 180. * **Y-axis:** f(θ) (Probability Density Function), ranging from 0 to 2. Markers are present at 0, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, and 2.0. * **Legend:** Located in the top-left corner. * Ground Truth: Dashed blue line. * t = 0: Prior: Solid green line. * t = 1: Likelihood: Dotted orange line. * t = 1: Posterior: Solid orange line. * t = 2: Likelihood: Dotted brown line. * t = 2: Posterior: Solid brown line. * t = 3: Likelihood: Dotted purple line. * t = 3: Posterior: Solid purple line. ### Detailed Analysis The chart displays several curves representing probability distributions. * **Ground Truth (Dashed Blue):** This distribution is sharply peaked around 90 degrees, with a maximum value of approximately 2.0. It has minimal probability density outside of a narrow range around 90 degrees. * **t = 0: Prior (Solid Green):** This distribution is relatively flat, with a peak around 0 degrees and a maximum value of approximately 0.5. It has a wider spread than the Ground Truth. * **t = 1: Likelihood (Dotted Orange):** This distribution is centered around -45 degrees, with a maximum value of approximately 0.6. It is broader than the Ground Truth. * **t = 1: Posterior (Solid Orange):** This distribution is centered around 45 degrees, with a maximum value of approximately 1.6. It is more concentrated than the Prior and closer to the Ground Truth. * **t = 2: Likelihood (Dotted Brown):** This distribution is centered around -90 degrees, with a maximum value of approximately 0.4. * **t = 2: Posterior (Solid Brown):** This distribution is centered around 0 degrees, with a maximum value of approximately 0.8. It is more concentrated than the Likelihood and is shifting towards the Ground Truth. * **t = 3: Likelihood (Dotted Purple):** This distribution is centered around -45 degrees, with a maximum value of approximately 0.2. * **t = 3: Posterior (Solid Purple):** This distribution is centered around 90 degrees, with a maximum value of approximately 1.2. It is the closest to the Ground Truth among all posterior distributions. **Approximate Data Points (extracted visually):** | Time Step | Distribution | Azimuth (θ) | f(θ) | |---|---|---|---| | 0 | Prior | 0 | 0.5 | | 1 | Likelihood | -45 | 0.6 | | 1 | Posterior | 45 | 1.6 | | 2 | Likelihood | -90 | 0.4 | | 2 | Posterior | 0 | 0.8 | | 3 | Likelihood | -45 | 0.2 | | 3 | Posterior | 90 | 1.2 | | Ground Truth | | 90 | 2.0 | ### Key Observations * The Prior distribution (t=0) is initially quite diffuse and doesn't resemble the Ground Truth. * The Likelihood distributions at t=1, t=2, and t=3 are consistently shifted to the left of the Ground Truth. * The Posterior distributions progressively converge towards the Ground Truth as time steps increase. The posterior at t=3 is the closest approximation. * The peak of the Posterior distributions shifts from negative angles (t=1) to positive angles (t=3), indicating a learning process. ### Interpretation This chart demonstrates a Bayesian filtering or state estimation process. The "Ground Truth" represents the actual angle, while the Prior represents an initial belief about the angle. The Likelihood functions represent observations or measurements that provide information about the angle. The Posterior distributions are updated beliefs about the angle, combining the Prior and the Likelihood. The chart shows how the system learns over time, refining its estimate of the angle based on incoming observations. The convergence of the Posterior distributions towards the Ground Truth indicates that the filtering process is working effectively. The shift in the Likelihood distributions suggests that the measurement process may have a systematic bias or error. The fact that the posterior at t=3 is not *exactly* the ground truth suggests that the system is still imperfect, or that the likelihood function is not perfectly representative of the measurement noise. The chart provides a visual representation of how Bayesian inference can be used to track and estimate a dynamic variable over time. </details> <details> <summary>figs/fst_window_kernel_bayes_filters.png Details</summary> ![3dbcd023](/v1/image/3dbcd02382b42ba1124a6963477db771c8fabe5437824e6eacfedeefd3ec342a) ### Visual Description ## Charts: Acoustic Windowed Power and Correction Quotient ### Overview The image contains two charts. The top chart displays "Acoustic Windowed Power Averages" as a function of frequency, showing multiple lines representing different time steps (t = 0, 1, 2, 3) and an axial window. The bottom chart shows the "Acoustic Windowed Correction Quotient" also as a function of frequency, again with lines for different time steps. Both charts use a logarithmic frequency scale and a magnitude scale in decibels (dB). ### Components/Axes **Top Chart:** * **Title:** Acoustic Windowed Power Averages ŜA(ν, f(θ)) * **X-axis:** Frequency (Hz), logarithmic scale from 102 to 105. * **Y-axis:** Magnitude (dB), scale from -20 to -5. * **Legend:** * t = 0: Prior Window (Black) * t = 1: Posterior Window (Purple) * t = 2: Posterior Window (Orange) * t = 3: Posterior Window (Brown) * Axial Window ŜA(ν, f0(θ)) (Blue Dashed) **Bottom Chart:** * **Title:** Acoustic Windowed Correction Quotient Q̂(ν) * **X-axis:** Frequency (Hz), logarithmic scale from 102 to 105. * **Y-axis:** Magnitude (dB), scale from 0 to 15. * **Legend:** * t = 0: Prior Window (Black) * t = 1: Posterior Window (Purple) * t = 2: Posterior Window (Orange) * t = 3: Posterior Window (Brown) ### Detailed Analysis or Content Details **Top Chart - Acoustic Windowed Power Averages:** * **t = 0 (Black):** The line is relatively flat around -10dB from 102 Hz to approximately 103 Hz, then decreases to approximately -18dB at 104 Hz, with oscillations. * **t = 1 (Purple):** The line starts at approximately -6dB at 102 Hz, rises to a peak of approximately -3dB around 103 Hz, then rapidly decreases to approximately -20dB at 104 Hz, with oscillations. * **t = 2 (Orange):** The line starts at approximately -7dB at 102 Hz, rises to a peak of approximately -4dB around 103 Hz, then rapidly decreases to approximately -19dB at 104 Hz, with oscillations. * **t = 3 (Brown):** The line starts at approximately -7dB at 102 Hz, rises to a peak of approximately -4dB around 103 Hz, then rapidly decreases to approximately -19dB at 104 Hz, with oscillations. * **Axial Window (Blue Dashed):** The line starts at approximately -8dB at 102 Hz, rises to a peak of approximately -5dB around 103 Hz, then decreases to approximately -16dB at 104 Hz, with oscillations. **Bottom Chart - Acoustic Windowed Correction Quotient:** * **t = 0 (Black):** The line is relatively flat around -2dB from 102 Hz to approximately 104 Hz, then rises to approximately 12dB at 104 Hz, with oscillations. * **t = 1 (Purple):** The line starts at approximately 0dB at 102 Hz, rises to a peak of approximately 14dB around 104 Hz, with oscillations. * **t = 2 (Orange):** The line starts at approximately -1dB at 102 Hz, rises to a peak of approximately 13dB around 104 Hz, with oscillations. * **t = 3 (Brown):** The line starts at approximately -1dB at 102 Hz, rises to a peak of approximately 13dB around 104 Hz, with oscillations. ### Key Observations * In the top chart, the "Prior Window" (t=0) has a lower magnitude than the "Posterior Windows" (t=1, 2, 3) at lower frequencies, but the differences diminish at higher frequencies. * The "Posterior Windows" (t=1, 2, 3) are very similar to each other. * In the bottom chart, the "Prior Window" (t=0) has a lower magnitude than the "Posterior Windows" (t=1, 2, 3) across the entire frequency range. * The "Posterior Windows" (t=1, 2, 3) are very similar to each other. * Both charts exhibit oscillatory behavior at higher frequencies (around 104 Hz). ### Interpretation The charts likely represent the effect of applying a time-varying window function to acoustic data. The "Prior Window" (t=0) represents the initial state, while the "Posterior Windows" (t=1, 2, 3) represent the data after applying the window function for increasing time steps. The top chart shows how the windowing affects the power spectrum of the acoustic signal. The increase in magnitude at lower frequencies for the posterior windows suggests that the windowing process is enhancing the lower frequency components. The decrease in magnitude at higher frequencies suggests that the windowing process is attenuating the higher frequency components. The bottom chart shows the correction quotient, which likely represents the ratio of the windowed power spectrum to the original power spectrum. The positive values indicate that the windowing process is amplifying the signal in those frequency ranges. The similarity between the posterior windows suggests that the windowing process is converging over time. The oscillatory behavior at higher frequencies could be due to the shape of the window function or the presence of noise in the data. The charts demonstrate how a windowing function can be used to modify the frequency content of an acoustic signal, potentially for noise reduction or signal enhancement. The consistent behavior of t=1, t=2, and t=3 suggests the windowing process reaches a stable state after the first time step. </details> Figure 3: We equalize a sample loudspeaker with acoustic responses over the horizontal plane (left) between Bayesian estimates of the normalization angle $\bm{\bar{\theta}}$ in (13) (center) and the axial windowed power average. The acoustic power averages (right) over the posterior circular distribution windows $f(\theta\,|\,\mu=\mu^{\left\{{t}\right\}},\ell=\ell^{\left\{{t}\right\}})$ update across time-steps to yield a sequence of quotient correction targets in (10). Consider the sample loudspeaker responses and sequence of estimated normalization angles in Fig. 3 where the listener is $90^{\circ}$ offset the loudspeaker axis in azimuth. At $t=0$ prior to any measurements, the normalization angle assumes a circular distribution centered on the loudspeaker axis $\mu=0$ with wide dispersion FWHM $90.22^{\circ}$ . The first estimate $\bm{\bar{\theta}}^{\left\{{1}\right\}}=-60^{\circ}$ is inaccurate with high dispersion FWHM $90^{\circ}$ as shown in the $t=1$ likelihood. Although the posterior shifts its mean halfway between the prior’s mean and estimated angle, the dispersion remains high, which gives a similar acoustic windowed power average and correction quotient to that of the prior. The second estimate $\bm{\bar{\theta}}^{\left\{{2}\right\}}=75^{\circ}$ is more accurate with lower dispersion FWHM $45^{\circ}$ . The resulting posterior shifts much closer towards the estimate at much reduced dispersion, which distinguishes its windowed power average and correction quotient from the prior. The final and most accurate estimate $\bm{\bar{\theta}}^{\left\{{3}\right\}}=90^{\circ}$ with lowest dispersion FWHM $30^{\circ}$ yields a sharp posterior near the true normalization angle, which induces comb-filter patterns in the correction quotient due to lobbing in the loudspeaker’s anechoic response in azimuth. Therefore in practice, we avoid equalizing to direct acoustic-paths by enforcing a lower-bound dispersion FWHM $45^{\circ}$ for circular distributions $f_{n}(\theta)$ when computing the correction quotients $\hat{Q}_{n}(\nu)$ . ## 3 Loudspeaker Panning Optimization Let $R_{n}(\nu,\bm{r})=H_{n}(\nu,\bm{r})G_{n}(\nu)$ be the acoustic response at frequency $\nu$ and coordinate $\bm{r}$ of the $n^{th}$ normalized loudspeaker in (4), and the overall response of the normalized loudspeaker array follows $$ \begin{split}Y(\nu,\bm{r})=\sum_{n=1}^{N}R_{n}(\nu,\bm{r})X_{n}(\nu),\end{split} \tag{14} $$ where $X_{n}(\nu)$ is the transfer function of the array’s weights belonging to the $n^{th}$ loudspeaker. For normalized loudspeaker panning, we constrain $X_{n}(\nu)$ to have a common phase-component (e.g. delay or all-pass) across loudspeakers and solve for the unknown magnitude components $x_{n}(\nu)=\left|{X_{n}(\nu)}\right|$ , which are subject to frequency-dependent spatial-electrical-acoustic domain constraints. The magnitude components at frequency $\nu$ are therefore expressed as a vector of panning gains $\bm{x}=\left[{x_{1},\ldots x_{N}}\right]^{T}\in\mathbb{R}^{N\times 1}$ , whereby we omit the frequency $\nu$ specification for simplifying notation. Further simplifications following the loudspeaker normalization are possible when specifying domain-specific constraints. Loudspeaker coordinates reduce to their unit-directions in the spatial domain given by $$ \begin{split}\bm{V}=\left[{\bm{v}_{1},\ldots,\bm{v}_{N}}\right]\in\mathbb{R}^{2xN},\quad\bm{v}_{n}=\frac{\bm{u}_{n}}{\left\lVert\bm{u}_{n}\right\rVert}.\end{split} \tag{15} $$ The normalization filter’s electrical gain $\left|{G_{n}(\nu)}\right|$ bounds the electrical headroom in the electrical domain. The normalized loudspeaker acoustic responses in (4) are matched at the listener’s location in the acoustical domain. Spatial Panning Constraints: The vector-base amplitude panning with slack (VBAPS) constraint is given by $$ \begin{split}\bm{V}\bm{x}=\lambda\bm{s},\quad\bm{x}\geq\bm{0},\quad\lambda\geq 0,\end{split} \tag{16} $$ where the panning gains $\bm{x}$ are non-negative as to preserve the relative-phase between loudspeaker pairs, and constrain the weighted average of the loudspeaker directions $\bm{V}$ to coincide with the target steering unit-direction $\bm{s}\in\mathbb{R}^{2\times 1}$ upto non-negative scale given by the slack-variable $\lambda$ . The latter is an augmented variable for both scaling the target unit-direction $\bm{s}$ to lie in equality with the panning direction $\bm{V}\bm{x}$ as shown in Fig. 4, and to accommodate constraints placed on $\bm{x}$ from other domains. The feasible steering and panning directions, and panning gains are therefore constrained as follows: <details> <summary>figs/vbaps.png Details</summary> ![f7ebcfb0](/v1/image/f7ebcfb04642c4948a7d3ae8b4ed1960e57c9c70a9c62cacca704938bfdf09ac) ### Visual Description \n ## Diagram: Acoustic Localization Feasibility ### Overview This diagram illustrates the feasibility of acoustic localization based on the relative positions of two sound sources (speakers), a listener, and the target location. It uses concentric dotted circles to represent regions of feasible and infeasible localization, determined by the speed of sound and the distances involved. ### Components/Axes The diagram features the following components: * **Target s:** Located at the top-center of the diagram. * **Listener:** Positioned at the center of the diagram. * **Speaker 1 (v₁):** Located on the left side of the diagram. * **Speaker 2 (v₂):** Located on the right side of the diagram. * **Feasible s:** A region above the listener, indicated by a blue dotted circle. * **Infeasible s:** A region below the listener, indicated by a yellow dotted circle. * **λs:** A purple arrow pointing upwards from the listener towards the target. * **v₁x₁:** A purple arrow pointing from Speaker 1 towards the target. * **v₂x₂:** A purple arrow pointing from Speaker 2 towards the target. * **v₁x₁ + v₂x₂:** A purple arrow representing the vector sum of v₁x₁ and v₂x₂ pointing towards the target. ### Detailed Analysis / Content Details The diagram depicts a scenario where two speakers (v₁ and v₂) are attempting to localize a target 's' relative to a listener. The feasibility of localization is determined by the geometry of the sound paths. * **Speaker 1 (v₁):** Positioned approximately 45 degrees to the left of the vertical axis. * **Speaker 2 (v₂):** Positioned approximately 45 degrees to the right of the vertical axis. * **Target s:** Directly above the listener. * **Feasible s:** The blue dotted circle encompasses the area above the listener where localization is considered feasible. The radius of this circle is approximately 1.5 times the distance from the listener to either speaker. * **Infeasible s:** The yellow dotted circle encompasses the area below the listener where localization is considered infeasible. The radius of this circle is approximately 0.75 times the distance from the listener to either speaker. * **λs:** Represents the wavelength of the sound multiplied by the distance to the target. * **v₁x₁:** Represents the component of the velocity of sound from Speaker 1 in the direction of the target. * **v₂x₂:** Represents the component of the velocity of sound from Speaker 2 in the direction of the target. * **v₁x₁ + v₂x₂:** Represents the vector sum of the velocity components from both speakers, indicating the combined effect of their signals on the target. ### Key Observations * The diagram highlights a geometric constraint on acoustic localization. * Localization is feasible in the region above the listener (Feasible s) and infeasible below (Infeasible s). * The vector sum of the sound paths from the two speakers (v₁x₁ + v₂x₂) points directly towards the target. ### Interpretation The diagram illustrates a fundamental principle in acoustic localization: the accuracy of localization depends on the relative positions of the sound sources, the listener, and the target. The feasible and infeasible regions are defined by the constructive and destructive interference patterns of the sound waves. The diagram suggests that localization is more accurate when the target is located in a region where the sound waves from the two speakers arrive in phase (Feasible s). Conversely, localization is less accurate when the sound waves arrive out of phase (Infeasible s). The use of vectors (v₁x₁, v₂x₂, v₁x₁ + v₂x₂) emphasizes the importance of considering the direction and magnitude of the sound waves in determining the target's location. This diagram is a simplified representation of a complex phenomenon, but it effectively conveys the key geometric constraints on acoustic localization. </details> <details> <summary>figs/headroom.png Details</summary> ![29b95b96](/v1/image/29b95b964c43a4b48190f99ea014c42cde43e05ff0d0d0bac1439c7cfc749998) ### Visual Description \n ## Diagram: Spatial Audio Setup ### Overview The image depicts a diagram illustrating a spatial audio setup with a listener surrounded by multiple audio sources. The diagram shows the relative positions of speakers and constraints on their distances to the listener. It appears to be a schematic representation for a sound localization or virtual reality application. ### Components/Axes The diagram features: * **Listener:** Represented by a silhouette of a head in the center. * **Audio Sources:** * `u1`: A speaker positioned directly in front of the listener (red arrow). * `u2`: A speaker positioned to the left of the listener (orange arrow). * `uL`: A speaker positioned above and to the left of the listener (purple arrow). * `uR`: A speaker positioned above and to the right of the listener (blue arrow). * `uD`: A speaker positioned below the listener (grey arrow). * `uS`: A speaker positioned to the right and below the listener (green arrow). * **Distance Constraints:** Mathematical inequalities are associated with each audio source, limiting their distance from the listener. * **Concentric Circles:** Two concentric circles are drawn around the listener, indicating distances of 1 meter and 2 meters. * **Dotted Circle:** A dotted circle is drawn around the listener, indicating a distance of 1 meter. ### Detailed Analysis or Content Details The diagram defines spatial constraints for each audio source: * **u1:** `x1 ≤ 1`, `2x2 ≤ 1` * **u2:** No explicit constraint is shown. * **uL:** `2xL ≤ 1` * **uR:** `3xR ≤ 1` * **uD:** `xS ≤ 1`, `4xD ≤ 1` * **uS:** No explicit constraint is shown. The distances from the listener are indicated by the concentric circles: * The inner dotted circle represents a radius of 1 meter. * The outer dotted circle represents a radius of 2 meters. The arrows indicate the direction of the sound source relative to the listener. The red arrow (u1) points directly forward. The orange arrow (u2) points to the left. The purple arrow (uL) points up and to the left. The blue arrow (uR) points up and to the right. The grey arrow (uD) points down. The green arrow (uS) points down and to the right. ### Key Observations * The constraints on the audio sources suggest a limitation on their maximum distance from the listener. * The constraints are not uniform across all sources. For example, `uR` has a stricter constraint (`3xR ≤ 1`) than `uL` (`2xL ≤ 1`). * The diagram focuses on the relative positioning of the audio sources and their distance from the listener, rather than absolute coordinates. * The constraints appear to be related to the perceived spatial location of the sound sources. ### Interpretation This diagram likely represents a simplified model for spatial audio rendering or sound localization. The constraints on the audio source distances are likely intended to ensure that the sound sources are perceived as being within a reasonable range for a realistic spatial audio experience. The different constraints for different speakers could be related to their intended role in the sound field (e.g., front speakers might have different constraints than surround speakers). The concentric circles provide a visual reference for the distances involved. The diagram suggests a system where the audio sources are positioned strategically around the listener to create a convincing 3D soundscape. The mathematical constraints are likely used in an algorithm to determine the appropriate volume and delay for each audio source to achieve the desired spatial effect. The diagram does not provide any information about the frequency content of the sounds or the characteristics of the speakers themselves. </details> Figure 4: VBAPS (left) constrains the feasible steering direction $\bm{s}$ to lie between the minor-arc of the loudspeaker pair coordinates $\bm{x}_{L},\bm{x}_{R}$ . Sample voltage constraints (right) are proportional to differences in loudspeaker-to-listener distance, orientation, and selection. Consider a set of $N$ loudspeakers and panning gains satisfying (16). The set of feasible steering unit-directions $\bm{s}$ must lie in the union of minor-arcs between all pairwise loudspeaker unit-directions. Conversely, steering directions are infeasible along the major-arc of a single loudspeaker-pair $N=2$ as shown in Fig. 4. For $N>2$ loudspeakers, the feasible $\bm{s}$ are all of $\mathbb{R}^{2}$ iff there exist a set of three loudspeakers where the negative direction of each loudspeaker lies between the minor-arc of the other two loudspeaker directions. The panning direction $\bm{V}\bm{x}$ is therefore constrained to be in the set of $\lambda$ -scaled feasible unit-directions $\bm{s}$ . We now introduce several evaluation metrics or objectives w.r.t. $\lambda$ . Let us define panning sensitivity by the acoustic-path distance ratio of the panning direction and the summation of component panning gained loudspeaker directions given by $$ \begin{split}\mathbb{S}(\bm{V},\bm{x},\bm{s})=\frac{\left\lVert\bm{V}\bm{x}\right\rVert}{\sum_{n=1}^{N}\left\lVert\bm{v}_{n}x_{n}\right\rVert}=\frac{\left\lVert\lambda\bm{s}\right\rVert}{\sum_{n=1}^{N}x_{n}}=\frac{\lambda}{\bm{x}^{T}\bm{1}},\end{split} \tag{17} $$ which has bounds $0<\mathbb{S}(\bm{V},\bm{x},\bm{s})\leq 1$ . Sensitivity is maximal iff non-zero panning gains belong to loudspeakers with directions coincident to the steering direction, large if panning gains disproportionately allocate to loudspeakers with directions closer to the steering direction, and minimal when panning gains allocate to loudspeakers with directions that sum to zero. Panning sensitivity therefore gives a similarity measure between panned and discrete sound-sources in the direction of $\bm{s}$ . This contrasts with cross-domain measures of panning efficiency, which evaluates the power ratios between panning direction and electric or acoustic gain as follows: $$ \begin{split}\mathbb{F}(\bm{K},\bm{V},\bm{x})=\frac{\bm{x}^{T}\bm{V}^{T}\bm{V}\bm{x}}{\bm{x}^{T}\bm{K}\bm{x}}=\lambda^{2}\left\lVert\bm{K}^{\frac{1}{2}}\bm{x}\right\rVert^{\minus 2},\end{split} \tag{18} $$ where $\bm{K}\in\mathbb{C}^{N\times N}$ is a domain-dependent covariance matrix (identity for electrical, model dependent for acoustical). For the electrical domain where $\bm{K}=\bm{I}$ , the maximum efficiency is $N$ for loudspeakers with directions coincident to the steering direction and uniform panning gains $\bm{x}=N^{\minus 1}\bm{1}$ . For the acoustic domain, the maximum efficiency is the largest generalized eigenvalues between $\bm{V}^{T}\bm{V}$ and $\bm{K}$ . Thus, higher panning efficiency is realized via more uniformly distributed panning gains across loudspeakers, whereas high panning sensitivity follows sparsely distributed panning gains. Electrical Headroom Constraints: The electrical-power headroom of normalized loudspeakers decreases in proportion to the normalization filter power responses $\left|{G_{n}(\nu)}\right|^{2}$ . Under non-negative panning constraint, the quadratic electrical-power constraint are linearized as follows: $$ \begin{split}x_{n}x_{n}^{*}&\leq\left|{G_{n}(\nu)}\right|^{\minus 2},\quad x_{n}\geq 0,\quad\Rightarrow\quad\bm{0}\leq\bm{x}\leq\bm{\tau},\end{split} \tag{19} $$ where $\bm{\tau}=\left[{\left|{G_{1}(\nu)}\right|^{\minus 1},\ldots,\left|{G_{N}(\nu)}\right|^{\minus 1}}\right]^{T}\in\mathbb{R}^{N\times 1}_{\geq 0}$ is a vector containing the digital headroom per loudspeaker that bounds the feasible space of panning gains to the upper box-orthant. We give several examples of voltage headroom consumed by normalization in Fig. 4. Doubling the loudspeaker $\bm{u}_{1}$ ’s distance to the listener to that of $\bm{u}_{2}$ halves the voltage headroom. Re-orienting the loudspeaker $\bm{u}_{R}$ to face the listener at $\bm{u}_{L}$ lowers high-frequency headroom. Equalizing the mid-range loudspeaker at $\bm{u}_{D}$ to match the full-range loudspeaker at $\bm{u}_{S}$ decreases the low-frequency headroom. Acoustical Power Constraints: The acoustic covariances between the normalized loudspeaker transfer functions $R_{n}(\nu,\bm{r})$ in (14), over coordinates $\bm{r}$ in the listening area, specify quadratic power constraints in equality to the acoustic power target $\rho$ as follows: $$ \begin{split}\bm{x}^{T}\bm{K}\bm{x}&=\rho,\quad K_{ij}\approx\mathbb{E}_{\bm{r}\sim g(\bm{r})}\left[{R_{i}(\nu,\bm{r})R_{j}^{*}(\nu,\bm{r})}\right],\end{split} \tag{20} $$ whereby $\bm{r}$ is sampled from a disc of radius $\tau_{r}$ with a truncated uniform PDF $g(\bm{r})=\frac{1}{\pi\tau_{r}^{2}},\forall\,\left\lVert\bm{r}\right\rVert\leq\tau_{r}$ , and $0$ otherwise. For loudspeaker transfer functions in the far-field, spherical-waves can be approximated by plane-waves which give the acoustic covariance matrix $\bar{\bm{K}}$ with analytic terms $\bar{K}_{ij}$ as derived in appendix (31) as follows: $$ \begin{split}\bar{K}_{ij}=\left|{S(\nu,0)}\right|^{2}\left\{\begin{array}[]{cc}\frac{2J_{1}\left({D_{ij}\kappa\tau_{r}}\right)}{D_{ij}\kappa\tau_{r}},&D_{ij}\kappa\tau_{r}>0\vskip 2.00749pt\\ 1,&D_{ij}\kappa\tau_{r}=0\end{array}\right.,\end{split} \tag{21} $$ where $D_{ij}=\left\lVert\bm{v}_{i}-\bm{v}_{j}\right\rVert$ is the distance between loudspeaker unit-directions, and $J_{1}(z)$ is the Bessel function of the first kind. Note that at the listener location $\bm{r}=\bm{0}$ , the normalized loudspeaker transfer functions are constant in (4). Thus, the acoustic covariance matrix $\bar{\bm{K}}$ degenerates to the rank-1 matrix $\mathring{\bm{K}}=\left|{S(\nu,0)}\right|^{2}\bm{1}\bm{1}^{T}$ as the evaluation radius decreases to zero in $\lim_{\tau_{r}\rightarrow 0}g(\bm{r})=\delta$ . We therefore decompose the acoustic covariance as follows: Let the acoustic covariance matrix in (20) be a mixture of the listener location, and listening area covariances given by $$ \begin{split}\bm{K}=(1-\alpha)\mathring{\bm{K}}+\alpha\bar{\bm{K}},\quad 0\leq\alpha\leq 1,\end{split} \tag{22} $$ where the acoustic covariance for $\alpha=0$ evaluates only the direct acoustic transfer function from loudspeakers to the listener location. The quadratic constraints in (20) linearize to $\bm{x}^{T}\bm{1}=\sqrt{\rho}\left|{S(\nu,0)}\right|^{\minus 1}$ for non-negative $\bm{x}$ ; maximizing $\lambda$ s.t. the linear gain summation constraint maximizes the panning sensitivity. Conversely, the acoustic covariance for $\alpha=1$ evaluates the acoustic transfer functions over a larger listening area; maximizing $\lambda$ s.t. the quadratic equality constraint maximizes panning efficiency. Moreover, the loudspeaker acoustic covariances in the listening area at the limits are correlated in low-frequency $\lim_{\kappa\rightarrow 0}\bar{\bm{K}}=\mathring{\bm{K}}$ , and uncorrelated in high-frequency or large evaluation radii $\lim_{\kappa\rightarrow\infty}\bar{\bm{K}}=\lim_{\tau_{r}\rightarrow\infty}\bar{\bm{K}}=\bm{I}$ . Therefore, the mixture of covariances (22) are proportional to $\bm{K}\propto(1-\alpha)\bm{1}\bm{1}^{T}+\alpha\bm{I}$ . We now formulate the loudspeaker steering optimization w.r.t. spatial, electrical, and acoustical constraints. Optimal Panning Sensitivity and Efficiency (OPSE): Maximizing the panning sensitivity $\lambda$ subject to spatial, acoustical, and electrical constraints is the second-order cone problem [23] given by $$ \begin{split}(\lambda_{*},\bm{x}_{*})&=\arg\max_{\lambda.\bm{x}}\,\lambda\qquad\textrm{s.t.}\quad\lambda\geq 0,\\ \bm{V}\bm{x}&=\lambda\bm{s},\quad\bm{x}^{T}\bm{K}\bm{x}\leq\rho,\quad\bm{0}\leq\bm{x}\leq\bm{\tau},\end{split} \tag{23} $$ where a feasible solution always exist if the acoustic loudness’s equality constraint in (20) is relaxed to be in inequality; acoustic loudness is tight w.r.t. $\rho$ if panning sensitivity (17) or efficiency (18) is also maximized. We can eliminate $\lambda$ by left-multiplying both sides of the equality constraints in (23) by unit-direction $\bm{s}^{T}$ to yield $\lambda=\bm{s}^{T}\bm{V}\bm{x}$ , and the equality constraint matrix $\bm{A}=(\bm{I}-\bm{s}\bm{s}^{T})\bm{V}$ . The equivalent optimization in only $\bm{x}$ is expressed as follows: $$ \begin{split}\bm{x}_{*}&=\arg\max_{\bm{x}}\,\bm{c}^{T}\bm{x}\qquad\textrm{s.t.}\quad\bm{c}^{T}\bm{x}\geq 0,\\ \bm{A}\bm{x}&=\bm{0},\quad\bm{x}^{T}\bm{K}\bm{x}\leq\rho,\quad\bm{0}\leq\bm{x}\leq\bm{\tau},\end{split} \tag{24} $$ where the objective maximizes the panning gains $\bm{x}$ in the direction of vector $\bm{c}=\bm{V}^{T}\bm{s}$ , consisting of cosine similarities between the target and loudspeaker unit-directions. Moreover, the equality constraints restrict $\bm{x}$ to the null space of $\bm{A}$ , which has nullity $N-1$ . Thus for real-time applications and small number of loudspeakers $(N\leq 5)$ , we remove the equality constraints and reduce the number of variables via the linear transformation of the panning gains $\bm{x}=\bar{\bm{A}}\bm{y}$ along an orthonormal basis $\bar{\bm{A}}^{T}\bar{\bm{A}}=\bm{I}$ of the null space $\bar{\bm{A}}\in\textrm{span}\left({\textrm{ker}\left({\bm{A}}\right)}\right)\in\mathbb{R}^{N\times N-1}$ . The optimization in the kernel space reduces to linear and quadratic inequality constraints given by $$ \begin{split}\bm{y}_{*}&=\arg\max_{\bm{y}}\,\bar{\bm{c}}^{T}\bm{y}\quad\textrm{s.t.}\,\,\,\begin{array}[]{c}\bar{\bm{c}}^{T}\bm{y}\geq 0,\\ \bm{0}\leq\bar{\bm{A}}\bm{y}\leq\bm{\tau},\end{array}\,\,\,\bm{y}^{T}\bar{\bm{K}}\bm{y}\leq\rho,\end{split} \tag{25} $$ where $\bar{\bm{c}}=\bar{\bm{A}}^{T}\bm{c}$ , and $\bar{\bm{K}}=\bar{\bm{A}}^{T}\bm{K}\bar{\bm{A}}$ , and the feasible region is convex. Lastly, the steering direction $\bm{s}$ can be infeasible where only the trivial solution $\bm{x}=\bm{0}$ satisfies the VBAPS equality constraint; dropping the VBAPS constraints $\bm{A}\bm{x}=\bm{0}$ and $\bm{c}^{T}\bm{x}\geq 0$ in the primary form (24) relaxes the feasible space to be convex. Therefore, optimal solutions for both the null space (25) and relaxed primary forms can be efficiently found via interior-point methods. Let us now investigate the solutions to (23), (24), (25) under various acoustic power, covariance, and loudspeaker layouts in practical applications. ## 4 Experiments Distributed Center Channel: In the $5.0$ multichannel standard, the center content channel is fully sent to a center loudspeaker in a $5.0$ ITU layout (left = $-30^{\circ}$ , right = $30^{\circ}$ , center = $0^{\circ}$ , surround left = $-110^{\circ}$ , surround right = $110^{\circ}$ ), where the maximum acoustic power (unity) is limited to that of a single loudspeaker. Under OPSE, we can specify a larger acoustic power target $\rho$ via the equality constraint $\bm{x}^{T}\bm{K}\bm{x}=\rho$ , spatial panning constraints of a center steering direction $\bm{s}=\left[{1;0}\right]$ , and unity electrical constraints $\bm{x}\leq\bm{1}$ WLOG. The optimal panning sensitivity gains for the listener location’s acoustic covariance $\bm{K}=\bm{1}\bm{1}^{T}$ are shown in Fig. 5 for increasing acoustic power $\rho$ targets. For acoustic power targets $0<\rho\leq 1$ , only the center loudspeaker is active $0<x_{C}\leq 1$ , and panning sensitivity is maximum. For $1<\rho\leq 9$ , the center loudspeaker exhausts its headroom and the left and right loudspeakers equally engage $(0<x_{L,R}\leq 1,\,x_{C}=1)$ , resulting in a slight loss in panning sensitivity ( $0.9$ at $\rho=9$ ), and increase in both panning/electric and acoustic/electric efficiency. For $9<\rho\leq 25$ , the left and right loudspeakers exhausts their headroom and the surround loudspeakers equally engage $(0<x_{SL,SR}\leq 1,\,x_{L,R,C}=1)$ , resulting in a sharper loss to panning sensitivity and degradation to panning/electric efficiency as the center steering direction lies in the infeasible sector of the surround loudspeaker pair. Note that for inequality constraints $\bm{x}^{T}\bm{K}\bm{x}\leq\rho$ , the surround panning gains remain in-active as the quadratic constraint is not tight for $\rho>9$ . Panning sensitivity therefore monotonically decreases for larger acoustic power targets. <details> <summary>figs/vary_acoustic_pow.png Details</summary> ![0440545a](/v1/image/0440545a0068bfcb8a33d2f988ab24ee0fd7c1b269eca2024de7610d9dc229f9) ### Visual Description \n ## Chart: Optimal Panning Gains for Varying Acoustic Power Targets ### Overview The image presents a line chart illustrating the relationship between "Acoustic Power Target" (x-axis) and "Gain" (left y-axis) and "Efficiency" (right y-axis) for various panning configurations. The chart explores optimal panning gains under a constraint defined by `xᵀKx = ρ`, where `K = 11ᵀ` and `x ≤ 1`. Multiple lines represent different panning strategies (Sensitivity, Center, Fronts, Surrounds, Panning/Electric, Panning/Acoustic, Acoustic/Electric). ### Components/Axes * **Title:** Optimal Panning Gains for Varying Acoustic Power Targets * **X-axis:** Acoustic Power Target ρ for Constraint `xᵀKx = ρ`, `K = 11ᵀ`, `x ≤ 1`. Scale ranges from 0 to 25. * **Left Y-axis:** Gain. Scale ranges from 0 to 1.5. * **Right Y-axis:** Efficiency. Scale ranges from 0 to 7. * **Legend:** Located at the top-right of the chart. Contains the following entries with corresponding colors: * Sensitivity (Blue) - `λ/xᵀ 1` * Center (Dark Blue, dashed) - `x_C` * Fronts (Dark Blue, dotted) - `x_L, x_R` * Surrounds (Dark Blue, dash-dot) - `x_SL, x_SR` * Panning/Electric (Orange) - `λ²/xᵀ x` * Panning/Acoustic (Dark Orange, dashed) - `λ²/xᵀ Kx` * Acoustic/Electric (Brown) - `xᵀ Kx/xᵀ x` ### Detailed Analysis The chart displays several lines, each representing a different panning gain strategy as a function of the acoustic power target. * **Sensitivity (Blue):** Starts at approximately 0.1 at ρ = 0, decreases to a minimum of approximately 0.05 at ρ = 7, then increases sharply to approximately 1.3 at ρ = 25. * **Center (Dark Blue, dashed):** Starts at approximately 1.1 at ρ = 0, decreases steadily to approximately 0.8 at ρ = 25. * **Fronts (Dark Blue, dotted):** Starts at approximately 0.8 at ρ = 0, decreases steadily to approximately 0.6 at ρ = 25. * **Surrounds (Dark Blue, dash-dot):** Starts at approximately 0.6 at ρ = 0, decreases to approximately 0.2 at ρ = 7, then increases to approximately 0.4 at ρ = 25. * **Panning/Electric (Orange):** Starts at approximately 1.0 at ρ = 0, decreases steadily to approximately 0.5 at ρ = 25. * **Panning/Acoustic (Dark Orange, dashed):** Starts at approximately 0.5 at ρ = 0, decreases to approximately 0.1 at ρ = 7, then increases to approximately 0.3 at ρ = 25. * **Acoustic/Electric (Brown):** Starts at approximately 0.2 at ρ = 0, decreases steadily to approximately 0.05 at ρ = 25. The right y-axis (Efficiency) shows corresponding values for each line. * **Sensitivity (Blue):** Efficiency increases from approximately 1 at ρ = 0 to approximately 6 at ρ = 25. * **Center (Dark Blue, dashed):** Efficiency decreases from approximately 4 at ρ = 0 to approximately 3 at ρ = 25. * **Fronts (Dark Blue, dotted):** Efficiency decreases from approximately 5 at ρ = 0 to approximately 4 at ρ = 25. * **Surrounds (Dark Blue, dash-dot):** Efficiency decreases from approximately 3 at ρ = 0 to approximately 1 at ρ = 7, then increases to approximately 2 at ρ = 25. * **Panning/Electric (Orange):** Efficiency decreases from approximately 2 at ρ = 0 to approximately 1 at ρ = 25. * **Panning/Acoustic (Dark Orange, dashed):** Efficiency decreases from approximately 3 at ρ = 0 to approximately 1 at ρ = 25. * **Acoustic/Electric (Brown):** Efficiency decreases from approximately 0 at ρ = 0 to approximately 0 at ρ = 25. ### Key Observations * The "Sensitivity" line exhibits the most dramatic change in gain, increasing significantly at higher acoustic power targets. * The "Center" line maintains a relatively stable gain across the entire range of acoustic power targets. * The "Acoustic/Electric" line consistently shows the lowest gain and efficiency. * The "Surrounds" line shows a dip in gain and efficiency around ρ = 7. * The "Panning/Acoustic" line shows a dip in gain and efficiency around ρ = 7. ### Interpretation This chart investigates the trade-offs between different panning strategies and their efficiency as the acoustic power target changes. The constraint `xᵀKx = ρ` likely represents a limitation on the total energy or power available for panning. The lines represent how the gain (amplification) needs to be adjusted for each panning configuration to achieve optimal performance under this constraint. The steep increase in "Sensitivity" gain at higher acoustic power targets suggests that this strategy becomes increasingly effective as more power is available. Conversely, the relatively stable "Center" gain indicates that this strategy is less sensitive to changes in acoustic power. The low gain and efficiency of "Acoustic/Electric" suggest that this strategy is less effective overall. The dips observed in the "Surrounds" and "Panning/Acoustic" lines around ρ = 7 might indicate resonance or other non-linear effects at that specific acoustic power target. These points could represent areas where the panning strategy becomes less stable or efficient. The chart provides valuable insights for optimizing panning configurations in acoustic systems, allowing designers to select the most appropriate strategy based on the available acoustic power and desired performance characteristics. The mathematical expressions associated with each line (e.g., `λ/xᵀ 1`) likely define the specific panning algorithms or configurations being evaluated. </details> Figure 5: OPSE center content more uniformly distributes across $5.0$ ITU loudspeakers for increasing acoustic power targets $\rho$ , and constant electrical headroom. Diffuse-field Panning: In reverberant environments, acoustic covariance between well-separated loudspeakers in the listening area decreases due to increasing variations in acoustic reflection path responses. Normalized loudspeakers produce a mixture of correlated sound-fields from their direct acoustic paths, and less correlated diffuse-fields from their reflection paths over a listening area. The acoustic covariance in the listening area is therefore proportional to (22). Let us reconsider the previous case of distributed center channel over a $3.0$ ITU layout (left = $-30^{\circ}$ , right = $30^{\circ}$ , center = $0^{\circ}$ ). Under OPSE, we constrain the acoustic power to unity $\bm{x}^{T}\bm{K}\bm{x}=1$ , relax the electrical headroom $\bm{x}\leq\bm{10}$ , and vary the mixture of acoustic covariances as shown in Fig. 6. For correlated sound-fields $0\leq\alpha\leq 1-\bm{s}^{T}\bm{v}_{L}$ , only the center loudspeaker is active as panning sensitivity is maximum. For less correlated sound-fields $1-\bm{s}^{T}\bm{v}_{L}<\alpha\leq 1$ , the center loudspeaker attenuates relative to the left and right loudspeakers as more uniform-distributed gains yield both higher acoustic/panning and panning/electric efficiency. The gap between acoustic/electric efficiency and its theoretical Rayleigh quotient maximum, given by the largest eigenvalue of $\bm{K}$ , closes at the diffuse-field limit $\alpha=1$ . OPSE therefore converges to the largest eigenvector of $\bm{K}$ under diffuse-field conditions where source-localization is difficult. <details> <summary>figs/vary_alpha.png Details</summary> ![a3b08de3](/v1/image/a3b08de303caa955e78be5f63c7579e18c2d72f1a70bd2c495ba997b1448ef7f) ### Visual Description \n ## Chart: Optimal Panning Gains for Varying Acoustic Covariances ### Overview The image presents a line chart illustrating the relationship between "Mixture Coefficient α" (on the x-axis) and "Gain" (on the primary y-axis) and "Efficiency" (on the secondary y-axis) for various acoustic covariance scenarios. The chart explores optimal panning gains under different conditions, with multiple lines representing different metrics. ### Components/Axes * **Title:** Optimal Panning Gains for Varying Acoustic Covariances * **X-axis:** Mixture Coefficient α. Scale ranges from 0 to 1, with a marked point at 0.1340. The equation `K = (1 - α)I + αL, xᵀKx = 1, x ≤ 10` is provided below the axis. * **Primary Y-axis (left):** Gain. Scale ranges from 0 to 1.4. * **Secondary Y-axis (right):** Efficiency. Scale ranges from 0 to 7. * **Legend:** Located at the top-right of the chart. Contains the following lines and their corresponding colors: * Sensitivity (Blue) - λ/xᵀ¹ * Center (Magenta) - xC * Fronts (Cyan) - xL, xR * Panning/Electric (Orange) - λ²/xᵀx * Panning/Acoustic (Red-Orange) - λ²/xᵀKx * Acoustic/Electric (Brown) - xᵀKx/xᵀx * **Gridlines:** Present to aid in reading values. ### Detailed Analysis The chart displays six distinct lines, each representing a different metric. 1. **Sensitivity (Blue):** This line starts at approximately 1.35 at α = 0, and remains relatively constant at around 0.95-1.0 across the entire range of α, with a slight downward trend. 2. **Center (Magenta):** This line exhibits a steep downward slope. It starts at approximately 1.2 at α = 0 and decreases rapidly to around 0.45 at α = 0.1340, then continues to decrease more slowly, reaching approximately 0.3 at α = 1. 3. **Fronts (Cyan):** This line starts at approximately 0.65 at α = 0 and decreases gradually to around 0.55 at α = 0.1340, then remains relatively constant at around 0.5-0.6 across the rest of the range. 4. **Panning/Electric (Orange):** This line starts at approximately 0.2 at α = 0 and increases steadily to around 0.45 at α = 0.1340, then continues to increase more slowly, reaching approximately 0.55 at α = 1. 5. **Panning/Acoustic (Red-Orange):** This line starts at approximately 0.2 at α = 0 and increases rapidly to around 0.45 at α = 0.1340, then continues to increase more slowly, reaching approximately 0.55 at α = 1. 6. **Acoustic/Electric (Brown):** This line starts at approximately 0.1 at α = 0 and increases steadily to around 0.3 at α = 0.1340, then continues to increase more slowly, reaching approximately 0.4 at α = 1. The secondary y-axis (Efficiency) provides a scale for the lines, with values ranging from 0 to 7. The Efficiency values generally correlate with the Gain values, but with different magnitudes. ### Key Observations * The "Center" metric exhibits the most significant change in Gain across the range of α. * The "Sensitivity" metric remains relatively stable, indicating consistent performance regardless of the mixture coefficient. * The "Panning/Electric" and "Panning/Acoustic" lines show similar trends, suggesting a strong relationship between these two metrics. * The point at α = 0.1340 appears to be a critical point where the slopes of several lines change significantly. ### Interpretation This chart demonstrates how optimal panning gains are affected by varying acoustic covariances, as represented by the mixture coefficient α. The different lines represent different metrics used to evaluate the performance of the panning system. The steep decline in the "Center" metric suggests that the center channel becomes less effective as the mixture coefficient increases. The relatively stable "Sensitivity" metric indicates that the system maintains a consistent level of sensitivity regardless of the acoustic conditions. The similar trends observed in the "Panning/Electric" and "Panning/Acoustic" metrics suggest that these two factors are closely related. The critical point at α = 0.1340 may represent a transition point where the system's behavior changes significantly. The chart provides valuable insights for designing and optimizing panning systems in various acoustic environments. By understanding the relationship between the mixture coefficient and the different metrics, engineers can select appropriate panning gains to achieve optimal performance. The inclusion of the Efficiency axis allows for a more comprehensive evaluation of the system's performance, taking into account both the Gain and the energy consumption. </details> Figure 6: OPSE center content gains for $3.0$ ITU loudspeakers converge to the acoustic/electric Rayleigh quotient maximizer in diffuse-field conditions. Circular Panning Across Loudspeaker Layouts: For adaptive multichannel reproduction, it is desirable to render content channels over common loudspeaker layouts shown in Fig. 7 for any listener location and front-direction. Under OPSE, we can evaluate the panning sensitivity for all steering directions in azimuth in both anechoic $\bm{K}=\bm{1}\bm{1}^{T}$ and diffuse-field $\bm{K}=\bm{I}$ conditions. Let us constrain the acoustic power to unity $\bm{x}^{T}\bm{K}\bm{x}=1$ , relax the electrical headroom $\bm{x}\leq\bm{10}$ , and vary $\bm{s}=[\cos\theta;\,\sin\theta]$ for the half-circle $0\leq\theta\leq\pi$ as the layouts are symmetric w.r.t. $\theta=0$ . For layouts with only frontal loudspeakers such as LRC, and wide LRC, the panning sensitivity remains high $>0.85$ for feasible steering directions. For infeasible steering directions, the VBAPS constraints are dropped in (24), and the panning sensitivity, taken to be $\bm{c}^{T}\bm{x}/\bm{x}^{T}\bm{1}$ , decrease for larger $\theta$ . The solutions are continuous w.r.t. $\theta$ for the anechoic covariance but discontinuous for the diffuse-field covariance at the feasibility boundary of $\theta$ . For triangular loudspeaker layouts (surround LRC, LRRear) containing the listener, only $2/3$ loudspeakers are active for any given $\theta$ . The solutions therefore uniquely satisfy the VBAPS constraints and are equivalent in both anechoic and diffuse-field conditions. LRRear has acceptable panning sensitivity between $\left|{\theta}\right|\leq 30^{\circ}$ , but minimal panning sensitivity near surround steering angles $100\leq\theta\leq 110$ . Surround LRC has low panning sensitivity for the left and right steering angles $\theta=\pm 30^{\circ}$ . For the LRSLSR layout, the panning sensitivity degrades in diffuse-field conditions for frontal angles $\left|{\theta}\right|\leq 60^{\circ}$ , and is minimal in the surround loudspeaker pair’s gap $110^{\circ}\leq\theta\leq 250^{\circ}$ . For the pentagon layout of uniformly spaced loudspeakers, anechoic and diffuse-field conditions have acceptable $>0.8$ and borderline $>0.7$ panning sensitivity respectively, with the latter also having lower variance. Under OPSE, the pentagon layout is therefore suited for uniform directional circular panning, LRSLSR for non-rear directional panning, and wide LRC for frontal to semi-surround directional panning for content reproduction. <details> <summary>figs/vary_layouts_mixed.png Details</summary> ![e0240f90](/v1/image/e0240f904cf66f9e5be8e2415c3a1cb6b35d8fdd2212d0e8caabc78aa45e9453) ### Visual Description \n ## Chart: Maximum Circular Panning Sensitivity for Common Loudspeaker Layouts ### Overview The image presents a comparison of panning sensitivity for various loudspeaker layouts under different acoustic covariance conditions. It consists of a polar plot showing loudspeaker layouts and two line graphs illustrating panning sensitivity as a function of steering direction azimuth for two different acoustic covariance matrices. ### Components/Axes * **Polar Plot (Left):** * Title: "Loudspeaker Layouts" * Angular scale: 0° to 360° * Radial scale: 216° to 330° (approximate) * Labels: "Listener" at the center, and lines representing different loudspeaker layouts (LRC, Wide LRC, Surround LRC, LRRear, LRLSLR, Pentagon). * **Line Graphs (Right):** * Title 1: "Acoustic Covariance: K = 11T" * Title 2: "Acoustic Covariance: K = I" * X-axis: "Steering Direction Azimuth (Degrees)" - Scale: 0° to 180° * Y-axis: "Panning Sensitivity" - Scale: 0.0 to 1.0 * Legend: * LRC (Blue Circle) * Wide LRC (Red Triangle) * Surround LRC (Yellow Diamond) * LRRear (Purple Square) * LRLSLR (Light Blue X) * Pentagon (Dotted Black Line) ### Detailed Analysis **Polar Plot:** The polar plot displays the spatial arrangement of different loudspeaker configurations relative to a central "Listener" position. The lines represent the positions of the speakers for each layout. The angular positions of the speakers are approximately as follows: * LRC: ~0°, ~110°, ~180° * Wide LRC: ~0°, ~72°, ~180° * Surround LRC: ~0°, ~110°, ~180°, ~250° * LRRear: ~0°, ~180°, ~288° * LRLSLR: ~0°, ~110°, ~180°, ~216°, ~330° * Pentagon: ~0°, ~72°, ~144°, ~216°, ~288° **Line Graphs:** **Graph 1: Acoustic Covariance: K = 11T** * **LRC (Blue Circle):** Starts at ~0.85 at 0°, decreases to ~0.2 at 60°, increases to ~0.6 at 110°, and decreases to ~0.3 at 180°. * **Wide LRC (Red Triangle):** Starts at ~0.8 at 0°, decreases to ~0.3 at 60°, increases to ~0.7 at 110°, and decreases to ~0.4 at 180°. * **Surround LRC (Yellow Diamond):** Starts at ~0.7 at 0°, decreases to ~0.4 at 60°, increases to ~0.8 at 110°, and decreases to ~0.5 at 180°. * **LRRear (Purple Square):** Starts at ~0.6 at 0°, decreases to ~0.2 at 60°, increases to ~0.7 at 110°, and decreases to ~0.4 at 180°. * **LRLSLR (Light Blue X):** Starts at ~0.5 at 0°, decreases to ~0.2 at 60°, increases to ~0.6 at 110°, and decreases to ~0.3 at 180°. * **Pentagon (Dotted Black Line):** Starts at ~0.6 at 0°, decreases to ~0.3 at 60°, increases to ~0.7 at 110°, and decreases to ~0.4 at 180°. **Graph 2: Acoustic Covariance: K = I** * **LRC (Blue Circle):** Starts at ~0.9 at 0°, decreases to ~0.4 at 60°, increases to ~0.7 at 110°, and decreases to ~0.5 at 180°. * **Wide LRC (Red Triangle):** Starts at ~0.9 at 0°, decreases to ~0.4 at 60°, increases to ~0.8 at 110°, and decreases to ~0.6 at 180°. * **Surround LRC (Yellow Diamond):** Starts at ~0.8 at 0°, decreases to ~0.5 at 60°, increases to ~0.9 at 110°, and decreases to ~0.7 at 180°. * **LRRear (Purple Square):** Starts at ~0.7 at 0°, decreases to ~0.3 at 60°, increases to ~0.8 at 110°, and decreases to ~0.6 at 180°. * **LRLSLR (Light Blue X):** Starts at ~0.6 at 0°, decreases to ~0.2 at 60°, increases to ~0.7 at 110°, and decreases to ~0.4 at 180°. * **Pentagon (Dotted Black Line):** Starts at ~0.7 at 0°, decreases to ~0.3 at 60°, increases to ~0.8 at 110°, and decreases to ~0.5 at 180°. ### Key Observations * All layouts exhibit a general trend of decreasing panning sensitivity between 0° and 60°, increasing between 60° and 110°, and decreasing again between 110° and 180°. * The "Surround LRC" layout consistently shows the highest panning sensitivity across both acoustic covariance conditions, particularly around 110°. * The "LRLSLR" layout generally exhibits the lowest panning sensitivity. * The acoustic covariance matrix significantly impacts the overall panning sensitivity levels, with K = I resulting in higher sensitivity values compared to K = 11T. ### Interpretation The data suggests that the effectiveness of different loudspeaker layouts in creating a perceived sound source location is highly dependent on both the spatial arrangement of the speakers and the acoustic environment. The "Surround LRC" layout appears to be the most robust, providing the highest panning sensitivity across a wide range of steering angles and acoustic conditions. This likely stems from its ability to provide more comprehensive spatial coverage. The difference in results between the two acoustic covariance matrices (K = 11T and K = I) indicates that the acoustic properties of the room play a crucial role in how sound is perceived. A covariance matrix of K = I suggests a more diffuse acoustic environment, where sound reflections are less directional, leading to higher overall panning sensitivity. Conversely, K = 11T suggests a more directional acoustic environment, where reflections are more focused, potentially reducing panning sensitivity. The observed trends highlight the importance of considering both loudspeaker layout and room acoustics when designing audio systems for immersive experiences. The data could be used to optimize loudspeaker placement and equalization settings to achieve the desired spatial audio performance. </details> Figure 7: Maximum panning sensitivity in (24) varies across azimuth steering directions for different loudspeaker layouts, and in anechoic and diffuse-field conditions. All triangle arrangements have low-sensitivity gaps. A minimum of five uniform-spaced loudspeakers (Pentagon) achieves moderate sensitivity in diffuse-field conditions across all of azimuth. ## 5 Discussion While Bayesian loudspeaker normalization and OPSE formulations are acoustic measurement-free sound-field correction methods, their accuracy may be improved with indirect acoustic measurements. Loudspeaker acoustic covariances at the listening area, critical distances, and distance attenuation rates may be estimated from simplified room acoustic models such as image-sources [29] if the room dimensions are known, and from loudspeaker-to-microphone acoustic transfer functions co-located on supporting smart-loudspeaker devices. In such instances, we may couple loudspeaker normalization with OPSE by substituting the in-situ loudspeaker acoustic covariance estimates in-place of anechoic and planewave mixture. Extension to 3D layouts can be considered for some loudspeaker arrangements, but may be impractical for satisfying VBAPS constraints and ensuring feasible coverage over spherical coordinates. ## 6 Conclusion We presented a loudspeaker filtering method that normalizes multiple loudspeakers to a common acoustic target for a non-stationary listening location or area. Loudspeaker normalization angles w.r.t. the listener location were adapted via Bayesian posteriors over circular distribution probability density functions. We then formulated panning gain optimization problems by relaxing VBAP constraints to give a novel panning sensitivity / efficiency objective, and specified electrical, acoustical domain constraints. Augmented, primary, null-space, and relaxed forms of the OPSE problem were derived. Lastly, practical experiments quantified the OPSE solutions in applications for distributing excess content channels, evaluating efficiency in anechoic to diffuse-field conditions, and recommending loudspeaker layouts for circular panning and multichannel content reproduction. ## 7 Appendix Circular Distribution: We can evaluate $P(-a\leq\theta\leq a)$ for $f(\theta)$ in (11) via the power-series approximation and cosine-integral [30] given by $$ \begin{split}\int_{\minus a}^{a}e^{\frac{\minus d^{2}(\theta)}{2\ell^{2}}}d\theta&=\int_{\minus a}^{a}e^{\frac{\cos(\theta)\minus 1}{\ell^{2}}}d\theta=\sum_{n=0}^{\infty}\frac{\int_{\minus a}^{a}\cos^{n}(\theta)d\theta\minus 1}{\ell^{2n}n!}.\end{split} \tag{26} $$ The product of circular distributions is a circular distribution as the sum of weighted and phase-shifted cosines in the exponents’ terms is a cosine given by $$ \begin{split}f_{i}(\theta)&\propto e^{\frac{\minus d^{2}(\theta-\mu_{i})}{2\ell_{i}^{2}}}\propto e^{\frac{\cos(\theta-\mu_{i})}{\ell_{i}^{2}}}\quad\Rightarrow\\ f_{i}(\theta)f_{j}(\theta)&\propto e^{\frac{\ell_{j}^{2}\cos(\theta-\mu_{i})+\ell_{i}^{2}\cos(\theta-\mu_{j})}{\ell_{i}^{2}\ell_{j}^{2}}}=e^{\frac{\cos(\theta-\mu)}{\ell^{2}}},\end{split} \tag{27} $$ where the mean and dispersion of the product are given by $$ \begin{split}\mu_{ij}&=\operatorname{atan2}\left({\frac{\sin(\mu_{i})}{\ell_{i}^{2}}+\frac{\sin(\mu_{j})}{\ell_{j}^{2}},\,\frac{\cos(\mu_{i})}{\ell_{i}^{2}}+\frac{\cos(\mu_{j})}{\ell_{j}^{2}}}\right),\\ \ell_{ij}^{2}&=\ell_{i}^{2}\ell_{j}^{2}\left({\ell_{i}^{4}+2\cos(\mu_{i}-\mu_{j})\ell_{i}^{2}\ell_{j}^{2}+\ell_{j}^{4}}\right)^{\minus\frac{1}{2}}.\end{split} \tag{28} $$ The posterior dispersion depends on both prior mean and dispersion unlike that of a normal distribution. Substituting the empirical mean $\mu_{i}=\bm{\bar{\theta}}^{\left\{{t}\right\}}_{n}$ , dispersion $\ell_{i}=\bar{\ell}_{n}^{\left\{{t}\right\}}$ , and the prior’s mean $\mu_{j}=\mu_{n}^{\left\{{t-1}\right\}}$ , dispersion $\ell_{j}=\ell_{n}^{\left\{{t-1}\right\}}$ in (13) gives the posterior mean $\mu_{ij}=\mu_{n}^{\left\{{t}\right\}}$ , dispersion $\ell_{ij}=\ell_{n}^{\left\{{t}\right\}}$ . Plane-wave Covariance: Let $p(\bm{r})=e^{\minus j\kappa\bm{v}^{T}\bm{r}}$ be the $2$ D plane-wave equation with incident direction $\bm{v}$ , and the region of integration be the $2$ D disc of radius $R$ where $\bm{r}=\left[{x,y}\right]^{T}$ , $-R\leq x\leq R$ , $y=\sqrt{R^{2}-x^{2}}$ . We may express the incident angle and evaluation point in polar coordinates as follows: $$ \begin{split}\bm{v}&=\left\lVert\bm{v}\right\rVert\left[{\cos\theta_{v},\,\sin\theta_{v}}\right]^{T},\quad\bm{r}=r\left[{\cos\theta,\,\sin\theta}\right]^{T},\\ \bm{v}^{T}\bm{r}&=\left\lVert\bm{v}\right\rVert r\left({\cos\theta_{v}\cos\theta+\sin\theta_{v}\sin\theta}\right)\\ &=\left\lVert\bm{v}\right\rVert r\cos\left({\theta-\theta_{v}}\right),\\ -\bm{v}^{T}\bm{r}&=\left\lVert\bm{v}\right\rVert r\cos\left({\pi-\theta+\theta_{v}}\right),\quad\textrm{ Cosine reflection}\end{split} \tag{29} $$ where $0\leq r\leq R$ and $0\leq\theta\leq 2\pi$ , which removes the dependence on the plane-wave incident angle in subsequent integrals. The first moment is analytic w.r.t. the Bessel function of the first kind $J_{n}(x)$ and given by $$ \begin{split}\mathbb{E}\left[{p(\bm{r})}\right]&=\frac{1}{\pi R^{2}}\int_{0}^{2\pi}\int_{0}^{R}re^{-jk\bm{v}^{T}\bm{r}}\,dr\,d\theta\\ &=\frac{1}{\pi R^{2}}\int_{0}^{R}r\int_{0}^{2\pi}e^{jk\left\lVert\bm{v}\right\rVert r\cos\left({\pi-\theta+\theta_{v}}\right)}\,d\theta\,dr\\ &=\frac{2}{R^{2}}\int_{0}^{R}rJ_{0}(k\left\lVert\bm{v}\right\rVert r)\,dr\quad\textrm{Hansen-Bessel \cite[citep]{[\@@bibref{Number}{TEMME_1996}{}{}]}}\\ &=\frac{2J_{1}(k\left\lVert\bm{v}\right\rVert R)}{k\left\lVert\bm{v}\right\rVert R}.\quad\textrm{Bessel integral identity \cite[citep]{[\@@bibref{Number}{WEISSTEIN_BESSEL_FIRST_KIND}{}{}]}}\\ \end{split} \tag{30} $$ Let $p_{n}(\bm{r})=e^{\minus j\kappa\bm{v}_{n}^{T}\bm{r}}$ be the plane-wave equation of the $n^{th}$ loudspeaker. The second moment can be expressed as $\bm{v}=\bm{v}_{m}-\bm{v}_{n}$ , which after substitution in (30) follows $$ \begin{split}\mathbb{E}\left[{p_{m}(\bm{r})p_{n}^{*}(\bm{r})}\right]=\frac{1}{\pi R^{2}}\int_{0}^{2\pi}\int_{0}^{R}re^{\minus jk\left({\bm{v}_{m}\minus\bm{v}_{n}}\right)^{T}\bm{r}}\,dr\,d\theta\\ =\left\{\begin{array}[]{cc}\frac{2J_{1}\left({k\left\lVert\bm{v}_{m}-\bm{v}_{n}\right\rVert R}\right)}{k\left\lVert\bm{v}_{m}-\bm{v}_{n}\right\rVert R},&\left\lVert\bm{v}_{m}-\bm{v}_{n}\right\rVert>0\vskip 2.00749pt\\ 1,&\left\lVert\bm{v}_{m}-\bm{v}_{n}\right\rVert=0\end{array}\right..\end{split} \tag{31} $$ ## References - Recommendation [2022] Recommendation, I., “ITU-R Recommendation BS. 775-4,” Multi-channel stereophonic sound system with or without accompanying picture, 2022. - De Sena et al. [2013] De Sena, E., Hacıhabiboğlu, H., and Cvetković, Z., “Analysis and design of multichannel systems for perceptual sound field reconstruction,” IEEE Trans. Audio Speech Lang. Process., 21(8), pp. 1653–1665, 2013. - López et al. [2005] López, J. J., González, A., and Fuster, L., “Room compensation in wave field synthesis by means of multichannel inversion,” in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005., pp. 146–149, IEEE, 2005. - Brännmark et al. [2013] Brännmark, L.-J., Bahne, A., and Ahlén, A., “Compensation of loudspeaker–room responses in a robust MIMO control framework,” IEEE Trans. Audio Speech Lang. Process., 21(6), pp. 1201–1216, 2013. - Jin and Kleijn [2015] Jin, W. and Kleijn, W. B., “Theory and design of multizone soundfield reproduction using sparse methods,” IEEE/ACM Trans. Audio Speech Lang. Process., 23(12), pp. 2343–2355, 2015. - Talagala et al. [2014] Talagala, D. S., Zhang, W., and Abhayapala, T. D., “Efficient multi-channel adaptive room compensation for spatial soundfield reproduction using a modal decomposition,” IEEE/ACM Trans. Audio Speech Lang. Process., 22(10), pp. 1522–1532, 2014. - Kolundzija et al. [2011] Kolundzija, M., Faller, C., and Vetterli, M., “Reproducing sound fields using MIMO acoustic channel inversion,” Journal of the Audio Engineering Society, 59(10), pp. 721–734, 2011. - Poulose et al. [2019] Poulose, A., Eyobu, O. S., and Han, D. S., “An indoor position-estimation algorithm using smartphone IMU sensor data,” Ieee Access, 7, pp. 11165–11177, 2019. - Won et al. [2009] Won, S.-h. P., Melek, W. W., and Golnaraghi, F., “A Kalman/particle filter-based position and orientation estimation method using a position sensor/inertial measurement unit hybrid system,” IEEE Transactions on Industrial Electronics, 57(5), pp. 1787–1798, 2009. - Daniş and Cemgil [2017] Daniş, F. S. and Cemgil, A. T., “Model-based localization and tracking using bluetooth low-energy beacons,” Sensors, 17(11), p. 2484, 2017. - Wang et al. [2013] Wang, Y., Yang, X., Zhao, Y., Liu, Y., and Cuthbert, L., “Bluetooth positioning using RSSI and triangulation methods,” in 2013 IEEE 10th Consumer Communications and Networking Conference (CCNC), pp. 837–842, IEEE, 2013. - Kim et al. [2016] Kim, K.-W., Kwon, J., Lee, C.-G., and Han, J., “Accurate indoor location tracking exploiting ultrasonic reflections,” IEEE Sensors Journal, 16(24), pp. 9075–9088, 2016. - Greenfield and Hawksford [1991] Greenfield, R. and Hawksford, M. J., “Efficient filter design for loudspeaker equalization,” Journal of the Audio Engineering Society, 39(10), pp. 739–751, 1991. - Mäkivirta et al. [2018] Mäkivirta, A., Liski, J., and Välimäki, V., “Modeling and delay-equalizing loudspeaker responses,” AES: Journal of the Audio Engineering Society, 66(11), pp. 922–934, 2018. - Pulkki [1997] Pulkki, V., “Virtual sound source positioning using vector base amplitude panning,” Journal of the audio engineering society, 45(6), pp. 456–466, 1997. - Pulkki et al. [2001] Pulkki, V. et al., Spatial sound generation and perception by amplitude panning techniques, Helsinki University of Technology Ph. D. dissertation, 2001. - Pulkki [1999] Pulkki, V., “Uniform spreading of amplitude panned virtual sources,” in Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 187–190, IEEE, 1999. - Franck et al. [2017] Franck, A., Fazi, F. M., and Hamdan, E., “An optimization approach to control sound source spread with multichannel amplitude panning,” in 24th International Congress on Sound and Vibration, 2017. - Luo [2023] Luo, Y., “Active Barycentric Beamformed Stereo Upmixing,” in 2023 31st European Signal Processing Conference (EUSIPCO), pp. 316–320, 2023. - Sadek and Kyriakakis [2004] Sadek, R. and Kyriakakis, C., “A novel multichannel panning method for standard and arbitrary loudspeaker configurations,” 2004. - Batke and Keiler [2010] Batke, J.-M. and Keiler, F., “Investigation of Robust Panning Functions for 3-D Loudspeaker Setups,” in Audio Engineering Society Convention 128, Audio Engineering Society, 2010. - Oppenheim [1999] Oppenheim, A. V., Discrete-time signal processing, Pearson Education India, 1999. - Alizadeh and Goldfarb [2003] Alizadeh, F. and Goldfarb, D., “Second-order cone programming,” Mathematical programming, 95(1), pp. 3–51, 2003. - Horn and Johnson [1990] Horn, R. and Johnson, C., Matrix Analysis, Cambridge University Press, 1990, ISBN 9780521386326. - Sanathanan and Koerner [2003] Sanathanan, C. and Koerner, J., “Transfer function synthesis as a ratio of two complex polynomials,” IEEE Trans. Autom. Control, 8(1), pp. 56–58, 2003. - Lai and Lin [2010] Lai, X. and Lin, Z., “Minimax design of IIR digital filters using a sequential constrained least-squares method,” IEEE Trans. Signal Process, 58(7), pp. 3901–3906, 2010. - Toole [2006] Toole, F. E., “Loudspeakers and rooms for sound reproduction—A scientific review,” Journal of the Audio Engineering Society, 54(6), pp. 451–476, 2006. - Luo [2021] Luo, Y., “Spherical harmonic covariance and magnitude function encodings for beamformer design,” EURASIP Journal on Audio, Speech, and Music Processing, 2021, pp. 1–17, 2021. - Lehmann and Johansson [2008] Lehmann, E. A. and Johansson, A. M., “Prediction of energy decay in room impulse responses simulated with an image-source model,” The Journal of the Acoustical Society of America, 124(1), pp. 269–277, 2008. - Weisstein [2025] Weisstein, E. W., “Cosine Integral,” https://mathworld.wolfram.com/CosineIntegral.html, 2025, From MathWorld–A Wolfram Web Resource. - Temme [1996] Temme, N. M., Special Functions: An Introduction to the Classical Functions of Mathematical Physics, pp. 228–231, Wiley-Interscience, 1996. - Weisstein [2023] Weisstein, E. W., “Bessel Function of the First Kind,” https://mathworld.wolfram.com/BesselFunctionoftheFirstKind.html, 2023, From MathWorld–A Wolfram Web Resource.

Rendering Paper...