2310.17004v2

Model: gemini-2.0-flash

# Improved Panning on Non-Equidistant Loudspeakers with Direct Sound Level Compensation **Authors**: Jan-Hendrik Hanschke, Daniel Arteaga, Giulio Cengarle, Joshua Lando, Mark R. P. Thomas, Alan Seefeldt ## Improved Panning on Non-Equidistant Loudspeakers with Direct Sound Level Compensation Jan-Hendrik Hanschke, Daniel Arteaga, Giulio Cengarle, Joshua Lando, Mark R.P. Thomas, and Alan Seefeldt 1 1 Dolby Laboratories Correspondence should be addressed to Jan-Hendrik Hanschke ( janhendrikhanschke@ieee.org ) ## ABSTRACT Loudspeaker rendering techniques that create phantom sound sources often assume an equidistant loudspeaker layout. Typical home setups might not fulfill this condition as loudspeakers deviate from canonical positions, thus requiring a corresponding calibration. The standard approach is to compensate for delays and to match the loudness of each loudspeaker at the listener's location. It was found that a shift of the phantom image occurs when this calibration procedure is applied and one of a pair of loudspeakers is significantly closer to the listener than the other. In this paper, a novel approach to panning on non-equidistant loudspeaker layouts is presented whereby the panning position is governed by the direct sound and the perceived loudness is governed by the full impulse response. Subjective listening tests are presented that validate the approach and quantify the perceived effect of the compensation. In a setup where the standard calibration leads to an average error of 10 ◦ , the proposed direct sound compensation largely returns the phantom source to its intended position. ## 1 Introduction In stereo or multichannel loudspeaker setups, a virtual or phantom source is a sound that appears to emanate from a position other than the physical loudspeaker locations [1]. The most common rendering techniques for creating such phantom sources are based on stereo amplitude panning and their multichannel extensions (e.g., vector-base amplitude panning [2], dual/triple balance amplitude panning [3], distance-based amplitude panning [4]). These panning methods distribute the source signal among several loudspeakers, assigning a gain to each loudspeaker so that the resulting sound mixture creates the illusion of a phantom sound source coming from the intended direction. Amplitude panning techniques are commonly used in professional content creation tools for cinema, music and multimedia. With traditional channel-based formats, panning to channels takes place at the content creation side, addressing a small discrete set of canonical playback configurations (e.g., stereo, 5.1, etc.). These channel-based renderings are then played back on consumer systems where the loudspeaker positions may deviate from the canonical locations, causing a mismatch in angle and perceived level. These inaccuracies result in a shift of the perceived position of a phantom source with respect to the intended position. Object-based audio [5], which utilizes a renderer in the playback device and knowledge of the loudspeaker layout, opens the door to modifying relative gains of individual sources based on the knowledge of actual loudspeaker location and acoustic characteristics of the playback system. So far, most rendering techniques, including those that allow for flexible positioning of loudspeakers and process object-based audio, depend only on the angular position of the loudspeakers relative to the listener. The distance between each loudspeaker and the listening position is assumed to be equal, even if in common home setups that might not hold true. In case of unequal distances, the state of the art approach is to time align and loudness match the different loudspeakers [2], with the loudness estimated from the full room response of each loudspeaker, which we will refer to as full response compensation (FRC). In the authors' experience, this calibration approach fails when rendering content to layouts with non-equidistant loudspeakers, causing the phantom source to be systematically pulled towards the closest loudspeaker(s). Upon a more thorough reflection with regards to the position of a phantom source, the procedure of loudness matching seems to be at least partially at odds with the well established psychoacoustic principle of the Haas or precedence effect [6, 7]. When a sound is followed by a delayed version of itself with a time delay of approximately 1 ms or more (but less than the echo threshold), a single auditory event is perceived from the direction of the first arriving wavefront. As a consequence, the perceived direction of a single physical sound source in a room is dominated by sound on the direct path from the source to the listener, not by later arriving room reflections [8]. For time delays smaller than approximately 1 ms, the related summing localization principle [9] states that multiple wavefronts of sound fuse into a phantom source whose perceived direction is a combination of those for each wavefront. When considering panning across multiple time-aligned loudspeakers, this gives strong indication that the quantity determining the virtual source location is the direct sound from each loudspeaker, possibly including early reflections arriving before 1 ms, and not the total sound loudness contained in the entire reverberation tail. Although some literature in the context of equalization hints at the possibility that the direct sound plays a dominant role in the localization and timbre perception of sound [10, 11], and another study uses the anechoic decay as a simplified room model in a panning function [12], we are not aware of systematic studies of this phenomenon in the context of loudspeaker rendering, nor of any practical implementations. We study and propose a modified panning approach for non-equidistant loudspeakers based on the combined contributions of the direct sound from multiple loudspeakers, and show empirically that it leads to improved phantom source localization accuracy. In order to achieve loudness consistency across multiple phantom source locations, the total response of the loudspeakers is simultaneously considered. The paper is organized as follows: In Sec. 2 we introduce relevant quantities based on the sound decay model and explain the full response compensation method for level and delay compensation. Sec. 3 then covers the proposed approach to restore the intended phantom source position based on the direct sound contribution while maintaining loudness consistency. In Sec. 4 we describe a subjective listening test to validate the proposed approach and in Sec. 5 we present its results. These results and the main outcomes of the paper are discussed in Sec. 6. ## 2 Fundamentals ## 2.1 Distance-based level decay of loudspeakers in reverberant rooms As a loudspeaker plays a signal in a room, the direct sound-the sound traveling on the shortest path from the loudspeaker to the listener-is quickly followed by multiple, spatially diverse, indirect reflections with increasing temporal density, often referred to as the diffuse sound field. The direct sound intensity decays as the squared distance from the loudspeakers. The corresponding direct sound level for each loudspeaker, L DS i in decibel scale is  where Pi is the acoustic power of a source i , Qi its directivity factor in the direction of the listener, and di is the distance to the source. The diffuse sound intensity is almost constant, depends upon the room characteristics and varies little with source and receiver position, orientation, or distance. The distance at which direct and diffuse intensities are equal is commonly referred to as the critical distance Dc . The overall loudness at the listening position for a given loudspeaker can be inferred by assuming the total sound as the sum of direct sound and diffuse sound field. On axis, the loudness can be estimated from the total sound intensity in decibel scale as  In practice Li can also be obtained from the measurement with a sound level meter, e.g. when capturing pink noise. An equivalent calibration can be achieved through the acquisition and analysis of impulse responses (IRs). The loudness can be estimated from the RMS value of the IR hi ( t ) , with optional weighting filters applied wA ( t ) , for example A-weighting:  The direct sound level L DS i for each loudspeaker cannot be measured with a sound level meter. One option to obtain it is to use the room-independent model per (1). It can also be estimated by multiplying measured impulse responses with a time window that vanishes beyond a certain truncation time t after the arrival of the first peak. Using a small, fixed truncation time has the drawback that frequencies approximately lower than the inverse truncation time cannot be adequately represented. A frequency-dependent truncation (FDT) kernel k ( n ) [13] may be used to estimate the direct sound portion of the impulse response:  The frequency-dependent truncation filter truncates all frequency components of the impulse response to a time t or smaller. Most commonly, it truncates the lowest frequency under consideration to a time t and higher frequencies to a time smaller than t . This approach has the advantage of providing a better representation of the lower frequencies without compromising the truncation of the impulse response at higher frequencies. Fig. 1 shows examples of FDT applied to impulse responses of non-equidistant loudspeakers in a reverberant room. The corresponding direct sound level L DS i can be estimated by substituting hi for h DS i in (3). ## 2.2 Full response compensation (FRC) of non-equidistant loudspeakers The state of the art calibration approach involves loudness matching and loudspeaker time alignment. Loudness matching ensures that each loudspeaker produces the same loudness at the listening position when fed with a reference signal. Given a set of loudspeakers producing a loudness Li at the listening position, the loudness compensation D Li for each loudspeaker is  where L ref is a pre-established reference level. The loudness compensation gains are given by 10 D Li / 20 . To maintain loudness consistency, the gains gi produced by a panner are usually normalized so that the loudness of the phantom sound source is equal to the loudness of the corresponding sound source when emanating only from a single loudspeaker. For loudness-matched setups, this requires the following relationship be satisfied:  with p usually between 1 and 2. The common sine/cosine pairwise panning law is an example which satisfies the above condition for p = 2. However, any panning law can meet this requirement through normalization. Fig. 1: Measured IR (black line) for a loudspeaker at 1 . 5 m (top) vs. 3 m (bottom) distance, leading to a theoretical direct sound decay of 6 dB. Analysis of IRs with Frequency Dependent Truncation (grey line) shows a 6 . 3 dB level difference (pink- and A-weighted) in direct sound vs. 3 . 0 dB in overall sound (black line) between near and far loudspeaker. <details> <summary>Image 1 Details</summary> ![9cdd3323](/v1/image/9cdd3323ba6294c8a24402da44953acf3e7778b6bb746535378c202e4928ef3b) ### Visual Description ## Line Graphs: Impulse Response Plots ### Overview The image contains two line graphs, one above the other, displaying impulse response data. Both graphs share the same x-axis (Time in milliseconds) and y-axis (Level). Each graph contains two data series, one in gray and one in black, representing different impulse responses. The initial impulse is significantly larger than subsequent reflections. ### Components/Axes * **X-axis:** Time [ms], ranging from 0 to 40 ms, with gridlines at intervals of 10 ms. * **Y-axis:** Level, ranging from -1.0 to 1.0, with gridlines at intervals of 0.5. * **Data Series:** Each graph contains two data series: one in gray and one in black. There is no explicit legend, but the gray line appears to represent the initial impulse, while the black line represents the subsequent response. ### Detailed Analysis **Top Graph:** * **Gray Line:** The gray line shows a sharp, large initial peak at approximately 0 ms, reaching a level of approximately 0.9. It then rapidly decays to near zero within the first few milliseconds. * **Black Line:** The black line starts near zero and shows several smaller peaks and fluctuations after the initial impulse. These peaks occur between approximately 5 ms and 40 ms, with amplitudes generally below 0.2. The black line appears to be slightly above zero for the entire duration. **Bottom Graph:** * **Gray Line:** Similar to the top graph, the gray line shows a sharp, large initial peak at approximately 0 ms, reaching a level of approximately 0.3. It then rapidly decays to near zero within the first few milliseconds. * **Black Line:** The black line starts near zero and shows several smaller peaks and fluctuations after the initial impulse. These peaks occur between approximately 5 ms and 40 ms, with amplitudes generally below 0.2. The black line appears to be slightly above zero for the entire duration. ### Key Observations * Both graphs show a strong initial impulse followed by a series of smaller reflections or reverberations. * The initial impulse is significantly larger in the top graph (approximately 0.9) compared to the bottom graph (approximately 0.3). * The subsequent reflections (black line) appear to have similar characteristics in both graphs, with peaks generally below 0.2. * The time scale is relatively short (40 ms), suggesting that these are measurements of short-term impulse responses. ### Interpretation The graphs likely represent the impulse response of a system or environment. The initial impulse (gray line) indicates the direct sound or signal, while the subsequent reflections (black line) represent the reverberation or echoes. The difference in the initial impulse amplitude between the top and bottom graphs suggests that the measurements were taken under different conditions, such as different source levels or distances. The similarity in the subsequent reflections suggests that the reverberation characteristics of the environment are relatively consistent between the two measurements. The data suggests that the environment has a short reverberation time, as the reflections decay relatively quickly. </details> Loudspeaker time alignment consists of adding time delays to the closer loudspeakers so that all loudspeaker signals arrive at the listening position at the same time. The delays D t i applied to each loudspeaker are  where c is the speed of sound and d ref is a reference distance, usually the distance to the most distant loudspeaker. ## 3 Improved panning on non-equidistant loudspeakers As mentioned in the introduction, we observed that the phantom source is systematically pulled towards the closest loudspeakers when using the full response compensation approach outlined in Sect. 2.2. Here we propose an alternate procedure that restores the phantom source to its intended position by matching the direct sound from each loudspeaker and preserving the correct loudness by matching levels derived from the full response. ## 3.1 Improved phantom source location: direct sound compensation (DSC) Given a set of loudspeakers whose direct sound is characterized by a level L DS i as measured in decibels from the listener position, the direct sound compensation for each loudspeaker D Li is  where L DS ref is a reference direct sound level. The directsound compensation gains are 10 D L DS i / 20 . We assume that the loudspeaker calibration according to the full response compensation procedure outlined in Sec. 2.2 is already in place. To preserve the correct phantom source locations, the direct sound compensation needs to be applied to the gains and the effect of loudness compensation needs to be undone. Therefore, the panning gains gi coming from the amplitude panning algorithm are modified as follows:  ## 3.2 Loudness correction The application of (9) will lead to phantom source images in their correct location, but the loudness of each one of the phantom sources will generally not be correct as the perception of loudness is governed by the level of the entire room response, and not only by the direct sound. To recover the correct loudness of the phantom sound sources, gains g ′ i coming from the process of direct sound compensation are normalized to meet the condition in (6):  The complete system, a combination of the full response compensation approach with the additional direct sound compensation gain per source object is depicted in Fig. 2. Combining the gains stages from (9) and (10) along with the full response loudness compensation gains 10 D Li / 20 , the combined gains Gi for a source fed to each loudspeaker are  Should the method outlined here be applied to a loudspeaker setup calibrated in a different way than the state of the art FRC procedure, the specific details in Fig. 2, as well as (9) and (10), would change, but (11) above would still be valid. ## 3.3 Practical implementation From (11) the final panning gains are clearly dependent on the specifics of the loudspeaker layout, but more critically they are dependent in a manner that varies with phantom source location. This may be appreciated by noting that the denominator of (11) is a function of all the unmodified amplitude panning gains gi across all loudspeakers and will therefore in general be different for different phantom source locations. As such, a practical implementation requires a renderat-playback-time approach, where the panning gains of each source are applied independently based on the actual loudspeaker layout before mixing together into loudspeaker feeds. This allows for the accounting of direct sound and overall level differences on a per-source basis. This approach works naturally with object-based audio formats but can also be applied to pre-rendered channel-based formats by treating each channel as a "static object" with an assumed canonical playback position. This paper presents a broadband analysis and compensation of direct sound and overall loudness. All considerations can be extended to frequency dependent, narrowband calibration based on measurements in the listening room. ## 4 Experimental methods To formally confirm the theoretical and practical findings, a two-part listening test was conducted, isolating the audio attributes of interest respectively: one part focused on the spatial location of phantom sound sources described in Sec. 4.1; the second part targeted Fig. 2: System diagram of a panning algorithm enhanced by direct sound compensation, followed by full response loudness compensation and time alignment (dotted box). <details> <summary>Image 2 Details</summary> ![18133206](/v1/image/18133206ecfe8ed108f75f8797ecc8153078eba6c739ba08740df3f7b1ca2948) ### Visual Description ## Audio Processing Diagram: Loudness Normalization and Time Alignment ### Overview The image is a block diagram illustrating an audio processing pipeline. It shows how an audio object's metadata is used to generate multiple audio channels, each with panning, direct sound compensation, loudness matching, and time alignment. The diagram includes a loudness normalization step that affects all channels. ### Components/Axes The diagram consists of the following components: 1. **Audio Object & Position Metadata:** This is the input to the system. 2. **Panning gain g1, g2, ..., gn:** These blocks represent the panning gain applied to each channel. 3. **Direct sound compensation gain 10^(ΔL_i^DS - ΔL_i)/20:** These blocks represent the direct sound compensation applied to each channel. 4. **Loudness normalization gain 1 / (Σ_j |g'_j|^p)^(1/p):** This block represents the loudness normalization applied to all channels. 5. **Loudness matching gain 10^(ΔL_i/20):** These blocks represent the loudness matching applied to each channel. 6. **Time alignment delay Δt1, Δt2, ..., Δtn:** These blocks represent the time alignment delay applied to each channel. 7. **Speakers:** Represented by speaker icons at the end of each channel. ### Detailed Analysis or ### Content Details The diagram shows 'n' parallel processing paths, each representing an audio channel. * **Input:** The process begins with "Audio Object & Position Metadata". * **Panning:** This metadata is split into 'n' paths. Each path starts with a "Panning gain" block, labeled as g1, g2, ..., gn. * **Direct Sound Compensation:** Following the panning gain, each path has a "Direct sound compensation" block, with gains represented as 10^(ΔL_i^DS - ΔL_i)/20, where 'i' ranges from 1 to 'n'. * **Loudness Normalization:** All 'n' paths converge into a single "Loudness normalization" block. The gain for this block is given by the formula: 1 / (Σ_j |g'_j|^p)^(1/p). * **Loudness Matching:** After loudness normalization, the signal is split again into 'n' paths. Each path has a "Loudness matching" block with gains represented as 10^(ΔL_i/20), where 'i' ranges from 1 to 'n'. * **Time Alignment:** Each path ends with a "Time alignment delay" block, labeled as Δt1, Δt2, ..., Δtn. * **Output:** The output of each path is represented by a speaker icon. ### Key Observations * The diagram illustrates a multi-channel audio processing system. * The loudness normalization step is applied to all channels collectively. * Each channel has individual panning, direct sound compensation, loudness matching, and time alignment. ### Interpretation The diagram represents a system for spatial audio rendering. The "Audio Object & Position Metadata" likely contains information about the audio source and its desired location in a 3D space. The processing pipeline then generates individual audio channels for each speaker, taking into account panning, distance, and loudness. The loudness normalization step ensures that the overall loudness of the audio scene is consistent, while the time alignment step compensates for differences in arrival times due to speaker placement. The system aims to create a realistic and immersive audio experience for the listener. </details> the validation of applied loudness correction described in Sec. 4.2. The physical audio system was shared between the two experiments and was set up in an acoustically untreated room, matching typical living room conditions. It consisted of two stereo setups each with loudspeakers at 30 ◦ and -30 ◦ . One setup had the two loudspeakers placed equidistant at 300 cm with a height of 120 cm. The other one had the left loudspeaker at half the distance (150 cm) of the right one (300 cm). Both loudspeakers for this non-equidistant setup were at a height of 104 cm. A small loudspeaker model (Genelec 8020) was chosen to minimize acoustic impact in the form of occlusion and scattering from the lower, closer loudspeaker on the one behind it. The average ear height of the seated participants was 112 cm, in the middle between the two systems, ensuring an undisturbed acoustic path of both loudspeaker setups to the listener. Fig. 3 shows a schematic view of the listening test setup along with a picture of the actual setup. Loudspeakers are delay and level aligned according to the FRC calibration procedure based on measured impulse responses. The corresponding IRs, which were also analyzed by FDT to ascertain the direct sound levels, can be seen in Fig. 1. These direct sound levels matched the inverse square law (1). The listening test was realized using the webMUSHRA software [14]. There were 16 participants (13 male, 3 female) with an average age of 39.4 years. In a questionnaire 56% stated that they are audio professionals, 43% had past listening test experience and 19% claimed to be expert spatial audio listeners. ## 4.1 Localization test In the first listening test, participants were asked to evaluate the perceived angle of phantom sound sources. As shown in Fig. 4 three conditions were presented on each page of the listening test software. Each of these conditions used the same mono source content panned to an intended angle using three different panning approaches. For all three the underlying panning law was sin/cos panning. Intended source angles were 30 ◦ , 15 ◦ , 0 ◦ and -8 ◦ . The REF condition utilized the equidistant loudspeakers. The FRC condition refers to the non-equidistant loudspeakers which are delay and level aligned (see Sec. 2.2). DSC refers to the panning on the same system according to the methodology described in Sec. 3. The mono source content was a selection of a pop song, pink noise bursts, female speech, drums, and harpsichord samples. The UI position of each stimulus was initialized to a random position; similarly the order of all stimuli was randomized. The participants were instructed to switch between the three conditions on each page and drag and drop little spheres to the desired positions indicating the perceived azimuth location of the phantom sound sources. 10 ◦ step markers on the Fig. 3: A schematic top and front view of the listening test setup along with a picture of the actual setup. Four dots indicate the intended phantom sources angle at 30 ◦ , 15 ◦ , 0 ◦ and -8 ◦ . 10 degree markers help the participants connect reality to the listening test interface. <details> <summary>Image 3 Details</summary> ![182279ac](/v1/image/182279ac356e25b8fa33308324a6e2545caa1fdd930e0ad937171ccca22d9617) ### Visual Description ## Diagram: Sound Source Localization Setup ### Overview The image presents a diagram of a sound source localization setup, including a top-down schematic, a simplified representation, and a photograph of the physical setup. The diagram illustrates the positioning of speakers relative to a listener's head, with specific angles and distances indicated. ### Components/Axes * **Top-Down Schematic:** * Speakers: Represented by blue speaker icons. * Listener: Represented by a head icon. * Angles: 30°, 15°, 0°, -8° (indicated by colored dots: yellow, green, orange, purple respectively). * Distances: 300 cm (distance from speakers to a central point), 150 cm (distance from the central point to the listener's head). * **Simplified Representation:** * Speakers: Represented by speaker icons with sound waves emanating from them. * Listener: Represented by a head icon. * Colored dots: Yellow, green, orange, and purple, corresponding to the angles in the top-down schematic. * Vertical gray lines: Representing some form of physical barrier or marker. * **Photograph:** * Physical setup of speakers on stands. * A cabinet or console in the background. * Vertical gray tape markers on the wall, corresponding to the gray lines in the simplified representation. ### Detailed Analysis * **Top-Down Schematic:** * A speaker is positioned at approximately 30 degrees to the left of the listener, indicated by a yellow dot. * A speaker is positioned at approximately 15 degrees to the left of the listener, indicated by a green dot. * A speaker is positioned at 0 degrees (directly in front) of the listener, indicated by an orange dot. * A speaker is positioned at approximately -8 degrees to the right of the listener, indicated by a purple dot. * The distance from each speaker to a central point is 300 cm. * The distance from the central point to the listener's head is 150 cm. * **Simplified Representation:** * The speakers are positioned behind vertical gray lines. * The colored dots (yellow, green, orange, purple) indicate the relative positions of the sound sources. * **Photograph:** * The speakers are mounted on stands. * The gray tape markers on the wall align with the speaker positions. ### Key Observations * The diagram illustrates a controlled environment for sound localization experiments. * The setup involves multiple speakers positioned at different angles relative to the listener. * The distances and angles are precisely defined. * The photograph provides a visual representation of the physical setup. ### Interpretation The diagram depicts a sound source localization experiment setup. The top-down schematic provides a clear representation of the speaker positions and distances relative to the listener. The simplified representation and the photograph offer additional context, showing the physical arrangement of the speakers and the experimental environment. The use of colored dots to represent the angles helps to visually connect the different representations. The gray lines and tape markers likely serve as visual cues or barriers within the experiment. The setup is designed to allow for controlled testing of sound localization abilities under specific conditions. </details> Fig. 4: Listening test interface used the localization experiment. The intended angle is shared among the three conditions per page. REF, FRC and DSC systems are rated simultaneously. <details> <summary>Image 4 Details</summary> ![8ffb6484](/v1/image/8ffb648411f9fb8d22dfe22a4ded3cdf4d764db1c4350f953133d7631cb8f164) ### Visual Description ## Audio Mixer Interface ### Overview The image depicts a simplified audio mixer interface with three channels, each featuring Play/Pause buttons, a pan control, and a visual representation of the pan position. The pan position is shown using colored dots on a horizontal slider. ### Components/Axes * **Channel Controls (Top Row):** * Each channel has a "Play" button and a "Pause" button. * Each channel has a horizontal slider. * Each channel has a "PAN" button, colored differently for each channel (red, orange, blue). * **Pan Position Slider (Bottom):** * A horizontal slider divided into 7 segments. * The slider has a handle in the center. * Colored dots (red, orange, blue) indicate the pan position for each channel. * The slider has a gray handle on the left and right ends. ### Detailed Analysis * **Channel 1 (Left):** * "Play" and "Pause" buttons are present. * "PAN" button is red. * Red dot on the pan position slider is slightly to the right of the center. * **Channel 2 (Middle):** * "Play" and "Pause" buttons are present. * "PAN" button is orange. * Orange dot on the pan position slider is slightly to the left of the center. * **Channel 3 (Right):** * "Play" and "Pause" buttons are present. * "PAN" button is blue. * Blue dot on the pan position slider is to the right of the center. * **Pan Position Slider:** * The slider is divided into 7 vertical segments. * The slider handle is in the center. * The red dot is positioned slightly to the right of the center segment. * The orange dot is positioned slightly to the left of the center segment. * The blue dot is positioned to the right of the center segment. * The gray handles are at the extreme left and right ends of the slider. ### Key Observations * Each channel has identical controls (Play, Pause, Pan). * The pan positions for the three channels are different, with red slightly right, orange slightly left, and blue further right. * The slider handle is centered, suggesting a neutral pan position. ### Interpretation The image represents a basic audio mixer interface where the user can control the playback and pan position of three audio channels. The colored dots on the pan position slider provide a visual representation of the audio signal's placement in the stereo field. The different positions of the red, orange, and blue dots indicate that the audio signals from the three channels are panned to different locations in the stereo image. The centered slider handle likely represents the default or neutral pan position. </details> wall of the room matched identical indicators in the listening test software user interface and helped the listeners to connect it to reality. Five participants were excluded from the localization test. Four of them were excluded because in more than 15% of the cases they reported a hard panning to the left loudspeaker (30 ◦ ) anchor as being located at less than 15 ◦ . Another participant was excluded due to inconsistent reporting. ## 4.2 Loudness test To validate accurate loudness correction for phantom sound sources, listeners were asked to participate in a second part of the listening test. The utilized methodology was adapted from the loudness validation test proposed in [15]. The standardized ITU BS.1534 MUSHRA [16] interface was used, where the explicit and hidden reference was a panned source on the symmetric loudspeaker layout (REF). The participants were asked to evaluate the loudness of the same phantom source panned on the non-equidistant loudspeaker setup with respect to their similarity to the reference purely with respect to loudness. Two variants of DSC panned sources were presented, depending on whether direct sound compensation included the loudness correction in (10), or not: DSC LC, with loudness correction, and DSC NO LC, without it. Furthermore, an anchor in the form of a scaled reference at -10 dB was added (ANCH). Listeners provided a rating according to the MUSHRA scale with verbal anchors of bad, poor, fair, good and excellent . Phantom sound sources were panned to 30 ◦ , 15 ◦ , 0 ◦ with the same mono content selection from the previous part of the test. To shorten the length of the test -8 ◦ was left out since the smallest differences were expected for it. One participant was excluded from the second test on the basis of evaluating more than 15% of the hidden reference cases with less than 90 points. ## 5 Experimental results The statistical analysis follows the general guidelines in ITU-R BS.1534 [16], and was done using the rstatix package in R [17, 18]. Fig. 5: Localization test: perceived angular locations, as a function of the four intended angles [30 ◦ (i), 15 ◦ (ii), 0 ◦ (iii), -8 ◦ (iv)] and the test condition (REF, FRC, DSC). Dots represent the result of each one of the participants, averaged over all 5 contents items, and the box plots show the corresponding median values and interquartile range. <details> <summary>Image 5 Details</summary> ![1e9c9aae](/v1/image/1e9c9aae59822b7bb632b320cc9d361e6aa5881a44feff692b8045bdeec226b4) ### Visual Description ## Box Plot: Perceived Angle vs. Intended Angle ### Overview The image is a box plot comparing perceived angle to intended angle for three different conditions: REF (red), FRC (green), and DSC (blue). The x-axis represents the intended angle in degrees, and the y-axis represents the perceived angle in degrees. The plot shows the distribution of perceived angles for each condition at different intended angles. ### Components/Axes * **X-axis:** Intended Angle (deg) with markers at 30, 15, 0, and -8 degrees. * **Y-axis:** Perceived Angle (deg) with markers at -20, -10, 0, 10, 20, and 30 degrees. * **Legend:** Located at the bottom of the chart. * REF (red): Represents the reference condition. * FRC (green): Represents the first condition. * DSC (blue): Represents the second condition. ### Detailed Analysis The data is presented as box plots with individual data points overlaid. Each box plot represents the interquartile range (IQR), with a line indicating the median. The whiskers extend to 1.5 times the IQR, and individual points outside this range are plotted as outliers. **Intended Angle: 30 deg** * REF (red): The perceived angle is centered around 28 deg, with data points ranging from approximately 27 to 32 deg. * FRC (green): The perceived angle is centered around 29 deg, with data points ranging from approximately 28 to 31 deg. * DSC (blue): The perceived angle is centered around 29 deg, with data points ranging from approximately 28 to 32 deg. **Intended Angle: 15 deg** * REF (red): The perceived angle is centered around 18 deg, with data points ranging from approximately 12 to 21 deg. * FRC (green): The perceived angle is centered around 27 deg, with data points ranging from approximately 23 to 30 deg. * DSC (blue): The perceived angle is centered around 24 deg, with data points ranging from approximately 20 to 29 deg. **Intended Angle: 0 deg** * REF (red): The perceived angle is centered around -1 deg, with data points ranging from approximately -5 to 4 deg. * FRC (green): The perceived angle is centered around 9 deg, with data points ranging from approximately 6 to 13 deg. * DSC (blue): The perceived angle is centered around 2 deg, with data points ranging from approximately 0 to 5 deg. **Intended Angle: -8 deg** * REF (red): The perceived angle is centered around -12 deg, with data points ranging from approximately -15 to -9 deg. * FRC (green): The perceived angle is centered around -2 deg, with data points ranging from approximately -5 to 3 deg. * DSC (blue): The perceived angle is centered around -11 deg, with data points ranging from approximately -15 to -8 deg. ### Key Observations * As the intended angle decreases, the perceived angle also generally decreases for all three conditions. * At 30 degrees, all three conditions have similar perceived angles. * At 15 degrees, FRC and DSC show higher perceived angles compared to REF. * At 0 degrees, FRC shows a higher perceived angle compared to REF and DSC. * At -8 degrees, FRC shows a higher perceived angle compared to REF and DSC. ### Interpretation The data suggests that the FRC condition tends to overestimate the angle compared to the REF and DSC conditions, especially at lower intended angles (15, 0, and -8 degrees). The REF and DSC conditions show similar perceived angles across all intended angles, except at 15 degrees where DSC shows a slightly higher perceived angle. The differences in perceived angles between the conditions may indicate variations in how the angles are processed or interpreted under different conditions. The box plots provide a visual representation of the distribution of perceived angles, highlighting the central tendency and variability within each condition at each intended angle. </details> ## 5.1 Localization test Initially, the normality of the data was examined by means of a QQ plot, which revealed no apparent deviations from normality. A 3-way repeated measures ANOVA was conducted to examine whether the perceived angular positions were dependent on the test content. No significant interaction was revealed [ F ( 8 , 80 ) = 0 . 8, p = . 6]. Subsequently, results were averaged over the different source content items. The resulting data distribution is shown in Fig. 5, as a function of the three test conditions (REF, FRC, DSC) and the four panning angles [30 ◦ (i), 15 ◦ (ii), 0 ◦ (iii), -8 ◦ (iv)]. The median perceived positions for the symmetric reference system Fig. 6: Localization test: Mean delta perceived angular positions relative to the reference. Dots represent the mean values and bars the confidence intervals of the mean (95% CL). The stars in the plot indicate statistically significant t -tests (adjusted for multiple comparisons). One star (*) denotes p <. 05, two stars (**) denote p <. 01, three stars (***) denote p <. 001, and four stars (****) denote p < 10 -4 . <details> <summary>Image 6 Details</summary> ![c3cb8641](/v1/image/c3cb8641dae1b1100bf1b0e35861c7016d290ebc2c2e3c84d3164f1e344e7d9b) ### Visual Description ## Scatter Plot: Perceived Angle Delta vs. Intended Angle ### Overview The image is a scatter plot showing the relationship between "Intended Angle" (in degrees) and "Perceived Angle Delta" (in degrees) for three different conditions: REF (reference), FRC, and DSC. Error bars are displayed for each data point, indicating variability. Statistical significance is indicated by asterisks above the data points. ### Components/Axes * **X-axis:** Intended Angle (deg), with values at 30, 15, 0, and -8 degrees. * **Y-axis:** Perceived Angle Delta (deg), ranging from 0 to 20. * **Legend:** Located at the bottom of the chart. * REF: Red data points and line. * FRC: Green data points and line. * DSC: Blue data points and line. * **Statistical Significance:** Asterisks above the data points indicate statistical significance levels: * \*\* : p < 0.01 * \*\*\* : p < 0.001 * \*\*\*\* : p < 0.0001 ### Detailed Analysis **REF (Red):** * Trend: The Perceived Angle Delta remains approximately constant at 0 degrees across all Intended Angles. * Data Points: * Intended Angle 30 deg: Perceived Angle Delta ~ 0 deg * Intended Angle 15 deg: Perceived Angle Delta ~ 0 deg * Intended Angle 0 deg: Perceived Angle Delta ~ 0 deg * Intended Angle -8 deg: Perceived Angle Delta ~ 0 deg **FRC (Green):** * Trend: The Perceived Angle Delta increases from 30 to 15 degrees, then remains relatively constant from 15 to -8 degrees. * Data Points: * Intended Angle 30 deg: Perceived Angle Delta ~ 2 deg * Intended Angle 15 deg: Perceived Angle Delta ~ 9 deg * Intended Angle 0 deg: Perceived Angle Delta ~ 12 deg * Intended Angle -8 deg: Perceived Angle Delta ~ 10 deg **DSC (Blue):** * Trend: The Perceived Angle Delta increases from 30 to 0 degrees, then decreases slightly from 0 to -8 degrees. * Data Points: * Intended Angle 30 deg: Perceived Angle Delta ~ 2 deg * Intended Angle 15 deg: Perceived Angle Delta ~ 6 deg * Intended Angle 0 deg: Perceived Angle Delta ~ 3 deg * Intended Angle -8 deg: Perceived Angle Delta ~ 2 deg **Statistical Significance:** * FRC vs. REF: * Intended Angle 30 deg: \*\* * Intended Angle 15 deg: \*\*\* * Intended Angle 0 deg: \*\*\*\* * Intended Angle -8 deg: \*\*\*\* * DSC vs. REF: * Intended Angle 30 deg: \*\* * Intended Angle 15 deg: \*\* * Intended Angle 0 deg: \*\*\*\* * Intended Angle -8 deg: \*\*\*\* * FRC vs. DSC: * Intended Angle 30 deg: N/A * Intended Angle 15 deg: \*\* * Intended Angle 0 deg: \*\*\*\* * Intended Angle -8 deg: \*\*\* ### Key Observations * The REF condition consistently shows a Perceived Angle Delta of approximately 0 degrees, regardless of the Intended Angle. * The FRC condition exhibits a higher Perceived Angle Delta compared to REF, especially at Intended Angles of 15, 0, and -8 degrees. * The DSC condition shows a Perceived Angle Delta that is generally higher than REF, but lower than FRC, except at 30 degrees. * The statistical significance tests indicate that the differences between FRC/DSC and REF are highly significant (p < 0.0001) at Intended Angles of 0 and -8 degrees. ### Interpretation The data suggests that the FRC and DSC conditions introduce a systematic bias in the perception of angles, leading to a larger difference between the intended and perceived angles compared to the REF condition. The FRC condition appears to have a stronger effect on angle perception than the DSC condition. The statistical significance tests confirm that these differences are not due to random chance. The asterisks indicate the p-values for the statistical significance of the difference between the conditions. For example, the four asterisks above the FRC data point at 0 degrees indicate a p-value of less than 0.0001, meaning that the difference between FRC and REF at this angle is highly statistically significant. </details> were 28 ◦ (i), 18 ◦ (ii), -2 ◦ (iii), and -13 ◦ (iv), showing a slight displacement from their nominal positions. A 2-way repeated measures ANOVA was performed to examine the effects of the test condition and intended angle on the results. The ANOVA confirmed significant main effects for the test condition [ F ( 2 , 20 ) = 138 . 7, p = 2 × 10 -12 ], as well as a significant interaction between the test condition and intended angle [ F ( 3 . 0 , 29 . 5 ) = 17 . 0, p = 1 × 10 -6 ]. To further investigate the differences between angles and the three test conditions, multiple paired t -tests were conducted. We utilized the Benjamini-Hochberg method to account for multiple comparisons [16]; all stated p -values are already adjusted for this correction. <details> <summary>Image 7 Details</summary> ![7553aae4](/v1/image/7553aae49fcd8a375c0660b26471fd67712bc2ad0c853ea6995f9b31913c7d62) ### Visual Description ## Box Plot: MUSHRA Score vs. Intended Angle ### Overview The image is a box plot showing the distribution of MUSHRA (MUltiple Stimuli with Hidden Reference and Anchor) scores for different intended angles (30, 15, and 0 degrees). There are four data series represented by different colored box plots: red, blue, green, and purple. Each box plot shows the median, quartiles, and outliers for the MUSHRA scores at each intended angle. ### Components/Axes * **Y-axis:** MUSHRA Score, ranging from 0 to 100. Axis markers are present at intervals of 25 (0, 25, 50, 75, 100). * **X-axis:** Intended Angle (deg), with categories 30, 15, and 0. * **Data Series:** Four series are represented by box plots of different colors: red, blue, green, and purple. ### Detailed Analysis **Red Series:** * The red series consistently scores near 100 for all intended angles (30, 15, and 0 degrees). * The data points are clustered tightly around 100, with minimal variance. **Blue Series:** * At 30 degrees, the blue series has a median around 85, with a range from approximately 75 to 95. * At 15 degrees, the blue series has a median around 85, with a range from approximately 75 to 98. * At 0 degrees, the blue series has a median around 85, with a range from approximately 75 to 95. **Green Series:** * At 30 degrees, the green series has a median around 70, with a range from approximately 40 to 80. * At 15 degrees, the green series has a median around 75, with a range from approximately 60 to 95. * At 0 degrees, the green series has a median around 80, with a range from approximately 70 to 95. **Purple Series:** * The purple series consistently scores low for all intended angles (30, 15, and 0 degrees). * At 30 degrees, the purple series has a median around 30, with a range from approximately 20 to 50. * At 15 degrees, the purple series has a median around 30, with a range from approximately 20 to 50. * At 0 degrees, the purple series has a median around 30, with a range from approximately 20 to 50. ### Key Observations * The red series consistently achieves the highest MUSHRA scores across all intended angles. * The purple series consistently achieves the lowest MUSHRA scores across all intended angles. * The blue series scores are consistently high across all intended angles. * The green series scores are more variable, with a wider range of values compared to the red and blue series. ### Interpretation The box plot visualizes the distribution of MUSHRA scores for different audio samples or conditions (represented by the red, blue, green, and purple series) at various intended angles. The red series likely represents the reference or ideal condition, as it consistently scores near 100. The purple series likely represents a degraded or undesirable condition, as it consistently scores low. The blue and green series represent intermediate conditions, with the green series showing more variability in perceived quality. The intended angle does not appear to have a significant impact on the relative performance of the different series, as their relative rankings remain consistent across all angles. </details> REF DSC NO LC DSC LC ANCH Fig. 7: Loudness test: MUSHRA score as a function of the four intended angles [30 ◦ (i), 15 ◦ (ii), 0 ◦ (iii)] and the test condition (REF, DSC NO LC, DSC LC, ANCH). Dots represent the result of each one of the participants, averaged over all 5 content items, and box plots show the corresponding median values and interquartile range. Refer to Fig. 6 for a depiction of the perceived angle deltas with respect to the reference and the results of the paired t -tests. At 30 ◦ (hard panning to the left loudspeaker), all panning methods were statistically indistinguishable from one another ( p ≥ . 4). For the remaining phantom source positions, the average FRC results exhibited a consistent displacement of 9 to 11 degrees towards the closest loudspeaker, with these differences being significant in all cases ( p ≤ 1 × 10 -4 ). The average DSC results were much closer to the reference, but still displayed a slight displacement towards the closest loudspeaker: 6 ◦ (ii), 3 ◦ (iii), and 1 ◦ (iv). The differences were significant in cases (ii) and (iii) ( p ≤ . 002), but not in case (iv) ( p = . 2). Fig. 8: Loudness test: Differential mean MUSHRA scores relative to the reference as a function of the three intended angles and the test condition. Dots represent the mean values and bars the confidence intervals of the mean (95% CL). Results shown have undergone a standarization process betweeen the different participants (see main text). See caption of Fig. 6 for the meaning of the significance stars. <details> <summary>Image 8 Details</summary> ![8dabed9b](/v1/image/8dabed9b5e862361e6e6366b6290e932475faeb7dadd33fa43900e1216cea18b) ### Visual Description ## Chart: Difference in MUSHRA Scores vs. Intended Angle ### Overview The image is a scatter plot with error bars, comparing the difference in MUSHRA scores for three conditions (REF, DSC NO LC, and DSC LC) at three different intended angles (30, 15, and 0 degrees). Statistical significance is indicated with asterisks above each condition at each angle. ### Components/Axes * **X-axis:** Intended Angle (deg), with values 30, 15, and 0. * **Y-axis:** Difference MUSHRA Score, ranging from -40 to 0. * **Legend (bottom):** * Red: REF * Green: DSC NO LC * Blue: DSC LC * **Statistical Significance:** Asterisks above each condition at each angle indicate statistical significance. "****" indicates a high level of significance, while "**" indicates a lower level. ### Detailed Analysis **REF (Red)** * The REF condition consistently scores around 0 at all angles (30, 15, and 0 degrees). * The error bars for REF are very small, indicating low variance. * Statistical significance: "****" at all angles (30, 15, and 0 degrees) when compared to other conditions. **DSC NO LC (Green)** * At 30 degrees, the DSC NO LC score is approximately -33, with an error bar extending from approximately -25 to -40. * At 15 degrees, the DSC NO LC score is approximately -32, with an error bar extending from approximately -23 to -40. * At 0 degrees, the DSC NO LC score is approximately -17, with an error bar extending from approximately -10 to -23. * Trend: The DSC NO LC score increases (becomes less negative) as the intended angle decreases. * Statistical significance: "****" at 30 and 15 degrees, and "****" at 0 degrees when compared to REF. **DSC LC (Blue)** * At 30 degrees, the DSC LC score is approximately -17, with an error bar extending from approximately -10 to -24. * At 15 degrees, the DSC LC score is approximately -14, with an error bar extending from approximately -7 to -21. * At 0 degrees, the DSC LC score is approximately -12, with an error bar extending from approximately -5 to -19. * Trend: The DSC LC score increases (becomes less negative) as the intended angle decreases. * Statistical significance: "****" at 30 and 15 degrees, and "**" at 0 degrees when compared to REF. ### Key Observations * The REF condition consistently scores near 0, serving as a baseline. * Both DSC NO LC and DSC LC conditions have negative scores, indicating a perceived difference compared to the REF. * The DSC NO LC condition generally has lower (more negative) scores than the DSC LC condition. * The scores for both DSC NO LC and DSC LC increase as the intended angle decreases. * The statistical significance is generally high ("****") when comparing the DSC conditions to the REF, but lower ("**") for DSC LC at 0 degrees. ### Interpretation The data suggests that both DSC NO LC and DSC LC conditions are perceived differently from the reference (REF). The DSC NO LC condition is perceived more negatively than the DSC LC condition. The intended angle has an impact on the perceived difference, with smaller angles resulting in higher (less negative) scores for both DSC conditions. The high statistical significance indicates that these differences are likely not due to chance. The lower significance for DSC LC at 0 degrees suggests that the perceived difference between DSC LC and REF is less pronounced at that angle. </details> ## 5.2 Loudness test Initially we examined whether the test results of the loudness validation test were dependent on the test content. A 3-way repeated measures ANOVA revealed no significant interaction between the test condition and the content item [ F ( 4 . 4 , 38 ) = 2 . 3, p = . 07]. Subsequently, results were averaged over the different source content items. The resulting data distribution is shown in Fig. 7, as a function of the four test conditions (REF, DSC NO LC, DSC LC, ANCH) and the three panning angles [30 ◦ (i), 15 ◦ (ii), 0 ◦ (iii)]. The DSC LC condition always scores in the excellent range (above 80 MUSHRA points). The DSC NO LC condition scores systematically below DSC LC, the difference being greater for panning angles closer to the left loudspeaker. The QQ plot initially indicated moderate deviations from normality, which were determined to be a result of participants rating content on differing scales. To address this, a data normalization procedure was implemented. Specifically, each participant's result was standardized to have zero mean and unit variance. The MUSHRA scale was then restored by multiplying the standarized results by the global variance and adding the global mean. Following this procedure, the QQ plot no longer indicated evident deviations from normality of the data. The anchor was discarded from the subsequent analysis. We conducted a 2-way repeated measures ANOVA to examine the effects of test condition and the intended angle on the results. The ANOVA analysis confirmed a significant main effect for the test condition [ F ( 1 . 4 , 20 . 8 ) = 90 . 3, p = 6 × 10 -10 ] and significant interaction between test condition and intended angle [ F ( 2 . 7 , 40 . 0 ) = 28 . 1, p = 2 × 10 -9 ]. A subsequent post-hoc analysis was conducted, in the form of multiple paired t -tests between the different test conditions (see Fig. 8). Again, Benjamini-Hochberg correction for multiple comparisons [16] was applied. Analysis showed that without loudness correction, scores are on average 33 (i), 27 (ii), and 14 (iii) MUSHRA points lower than the reference on average. With loudness correction, this difference is reduced to 16 (i), 13 (ii), and 12 (iii) MUSHRA points. All mutual comparisons are significant ( p ≤ . 005). ## 6 Discussion The results of the experimental tests show that the common practice to time and level align loudspeakers is insufficient when dealing with non-equidistant loudspeakers, as the phantom source for the FRC system is consistently skewed towards a closer loudspeaker. The average perceived angle delta of about 10 ◦ across all angles under test is high and would result in a significantly impaired playback performance. In all tested cases, using the proposed DSC approach significantly improves the delta angle towards the intended panning position. It is noteworthy that, according to the experiment, DSC performs particularly well in the area in front of the listener, where the human hearing is most sensitive to angular changes. It is worth mentioning that at the largest examined panning angle (15 ◦ ) the experiment still showed a relatively high bias towards the closer loudspeaker (6 ◦ ). While it is possible that the calculated compensation gain was not totally accurate, it could be conceivable that visual cues of the close loudspeaker pull the rating towards it as the intended panning position comes close to it. After all, phantom source localization is a complicated task affected by multi-sensory factors. The results of the loudness validation test are in agreement with our hypothesis that the loudness of phantom sources is not defined by the direct sound, but by the full loudspeaker and room response. Across all tested angles, the DSC panning that was loudness compensated according to the full response, got a mean rating in the excellent range of the MUSHRA scale, in all cases significantly better than the non-loudness compensated version of DSC. The dependence on angle of the ratings for the non-loudness compensated condition nicely match the calculated diminishing dB value as the phantom source is panned further and further away from the closer loudspeaker. It is unsurprising that listeners rated differences between the loudness compensated DSC and the reference system. Those can mainly be attributed to other differences in the systems' characteristics such as direct-to-reverberant ratio or spatial characteristics of close loudspeakers which might not have been fully ignored by the listeners, though instructed to do so. While the formal listening test considered a single phantom source on a stereo layout, typical multimedia content contains many panned objects and the restoring effect to the intended positions using DSC accumulates. Listeners participating in informal listening using a DSC enabled object renderer and multiple loudspeakers reported not only on the restoration of the overall balance, which is otherwise heavily skewed towards close loudspeakers, but also commented on the vastly improved clarity of the mix. These effects were reported to positively affect the rendering, even when the difference in loudspeaker distance was not as substantial as in the presented listening test. Especially in the context of object-based audio and flexible rendering engines at playback time, this approach is a notable step forward towards a faithful representation of the artistic intent in the consumer environment. As object based content makes its way into more and more playback systems like living rooms or cars, the typical loudspeaker setup will be increasingly in-homogeneous and non-equidistant. Instead of forcing consumers to place loudspeakers in canonical positions, the system should be able to adapt. In this paper we have layed out that, as a consequence of the precedence effect, any panning algorithm and renderer will benefit from taking into account the importance of the relative direct sound. ## References - [1] Rumsey, F., Spatial audio , Taylor & Francis, 2012. - [2] Pulkki, V., 'Virtual Sound Source Positioning Using Vector Base Amplitude Panning,' Journal of Audio Engineering Society , 45(6), pp. 456-466, 1997. - [3] Thomas, M. R. and Robinson, C. Q., 'Amplitude panning and the interior pan,' in Audio Engineering Society Convention 143 , 2017. - [4] Lossius, T., Baltazar, P., and de la Hogue, T., 'DBAP - Distance-Based Amplitude Panning,' in International Conference on Mathematics and Computing , 2009. - [5] Tsingos, N., 'Object-Based Audio,' in A. Roginska and P. Geluso, editors, Immersive Sound , pp. 244-275, Routledge, 2017. - [6] Gardner, M. B., 'Historical Background of the Haas and/or Precedence Effect,' The Journal of the Acoustical Society of America , 43(6), pp. 1243-1248, 2005, ISSN 0001-4966. - [7] Haas, H., 'Über den Einflu b eines Einfachechos auf die Hörsamkeit von Sprache,' Acta Acustica united with Acustica , 1(2), pp. 49-58, 1951. - [8] Blauert, J. and Braasch, J., 'Acoustic Communication: The Precedence Effect,' Forum Acusticum Budapest 2005: 4th European Congress on Acustics , 2005. - [9] Blauert, J., Spatial Hearing: The Psychophysics of Human Sound Localization , The MIT Press, 1996. - [10] Bank, B., 'Combined quasi-anechoic and in-room equalization of loudspeaker responses,' in Audio Engineering Society Convention 134 , 2013. - [11] Cecchi, S., Romoli, L., Piazza, F., Bank, B., and Carini, A., 'A novel approach for prototype extraction in a multipoint equalization procedure,' in Audio Engineering Society Convention 136 , 2014. - [12] Matthews, E. A., Simulation and testing of a multichannel system for 3D sound localization , Master's thesis, Graduate School of Western Carolina University, 2015. - [13] Karjalainen, M. and Paatero, T., 'Frequencydependent signal windowing,' in Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics , pp. 35-38, 2001. - [14] Schoeffler, M., Bartoschek, S., Stöter, F.-R., Roess, M., Westphal, S., Edler, B., and Herre, J., 'webMUSHRA - A Comprehensive Framework for Web-based Listening Tests,' Journal of Open Research Software , 6, 2018. - [15] Berendes, H.-U., Travaglini, A., and Uhle, C., 'Validating Loudness Alignment Via Subjective Preference: Towards Improving ITU-R BS.17704,' Journal of the Audio Engineering Society , 2022. - [16] ITU-R BS.1534-3, 'Method for the subjective assessment of intermediate quality level of audio systems,' ITU recommendation, 2015. - [17] Kassambara, A., rstatix: Pipe-Friendly Framework for Basic Statistical Tests , 2023, R package version 0.7.2. - [18] R Core Team, R: A Language and Environment for Statistical Computing , Vienna, Austria, 2023.

Rendering Paper...