Efficient Audio Super-Resolution with a Differentiable Psychoacoustic Loss
2LCTI, Télécom Paris, IP Paris, France
Abstract
Audio super-resolution is commonly seen as the task of enhancing low-bitrate audio signals by creating missing high-frequency content. This work proposes AEROMamba-PAQM, an efficient variant of the AERO super-resolution architecture where attention and LSTM layers are replaced by the Mamba state-space model, and which incorporates a newly developed differentiable perceptual loss derived from the Perceptual Audio Quality Measure (PAQM). During training, the architecture requires approximately 2–4x less GPU memory than the baseline; during inference, it achieves a 14x speedup while using only one-fifth of the GPU memory. When upsampling both a piano dataset and MUSDB18 from 11.025 kHz to 44.1 kHz, subjective listening tests show that AEROMamba-PAQM outperforms AERO by 15% in perceived quality scores. To address the broader problem of improving audio that has been highly compressed by lossy coding, it is further proposed AEROMamba-PAQM++, which applies the same framework but replaces STFT reconstruction losses with the PAQM loss, specifically to enhance MP3 encoded audio at 32 kbps. In listening evaluations, AEROMamba-PAQM++ achieves 52% higher quality rating than AEROMamba-PAQM when restoring compressed audio. These results demonstrate that PAQM-driven training coupled with lightweight state-space modeling yields high perceptual quality and computational efficiency in both band-limited and compressed audio scenarios.
Results: Super-resolution of Bandlimited Audio
Results for the MUSDB and PianoEval datasets comparing ViSQOL, LSD, and subjective scores, as well as performance metrics on a NVIDIA RTX 3090 GPU for 10-second samples.
MUSDB Results
| Model | ViSQOL ↑ | LSD ↓ | Score ↑ |
|---|---|---|---|
| Low-Resolution | 1.82 | 3.98 | 38.22 |
| AERO | 2.90 | 1.34 | 60.03 |
| AEROMamba | 2.93 | 1.23 | 66.47 |
| AEROMamba-PAQM | 3.04 | 1.19 | 79.26 |
| AudioSR | 3.01 | - | - |
PianoEval Results
| Model | ViSQOL ↑ | LSD ↓ | Score ↑ |
|---|---|---|---|
| Low-Resolution | 4.36 | 1.09 | 72.92 |
| AERO | 4.38 | 0.99 | 76.89 |
| AEROMamba-HQ | 4.38 | 1.00 | 84.41 |
| AEROMamba-PAQM-HQ | 4.41 | 0.90 | 78.76 |
Models labeled with `-HQ` were trained on PianoEval-HQ.
Performance Comparison (NVIDIA RTX 3090)
| Method | GPU Usage (MB) | Time (s) | Parameters |
|---|---|---|---|
| AERO | 17091 | 1.246 | 19,432,958 |
| AEROMamba | 3000 | 0.087 | 20,964,190 |
Subjective Score Distributions
MUSDB
PianoEval
Statistical Tests (Mann-Whitney U)
Pairwise comparisons of subjective scores (p-values). Values < 0.05 are considered statistically significant.
MUSDB (Subjective)
| Comparison | p-value |
|---|---|
| Low Res vs. AEROMamba | < 0.0001 |
| Low Res vs. AEROMamba-PAQM | < 0.0001 |
| Low Res vs. AERO | < 0.0001 |
| AEROMamba vs. AEROMamba-PAQM | < 0.0001 |
| AEROMamba vs. AERO | 0.0089 |
| AEROMamba-PAQM vs. AERO | < 0.0001 |
PianoEval (Subjective)
| Comparison | p-value |
|---|---|
| Low Res vs. AEROMamba-HQ | < 0.0001 |
| Low Res vs. AEROMamba-PAQM-HQ | 0.0587 |
| Low Res vs. AERO | 0.3399 |
| AEROMamba-HQ vs. AEROMamba-PAQM-HQ | 0.0101 |
| AEROMamba-HQ vs. AERO | 0.0003 |
| AEROMamba-PAQM-HQ vs. AERO | 0.2975 |
Pairwise comparisons of ViSQOL scores (p-values).
MUSDB (ViSQOL)
| Comparison | p-value |
|---|---|
| AEROMamba vs. AEROMamba-PAQM | < 0.0001 |
| AEROMamba vs. AERO | 0.0007 |
| AEROMamba-PAQM vs. AERO | < 0.0001 |
| AEROMamba vs. Low Resolution | < 0.0001 |
| AEROMamba-PAQM vs. Low Resolution | < 0.0001 |
| AERO vs. Low Resolution | < 0.0001 |
| AudioSR vs. AEROMamba-PAQM | 0.2178 |
PianoEval (ViSQOL)
| Comparison | p-value |
|---|---|
| Low Res vs. AERO | < 0.0001 |
| Low Res vs. AERO-HQ | < 0.0001 |
| Low Res vs. AEROMamba | 0.2989 |
| Low Res vs. AEROMamba-HQ | < 0.0001 |
| Low Res vs. AEROMamba-PAQM | < 0.0001 |
| Low Res vs. AEROMamba-PAQM-HQ | < 0.0001 |
| Low Res vs. AudioSR | < 0.0001 |
| AERO vs. AERO-HQ | 0.1077 |
| AERO vs. AEROMamba | < 0.0001 |
| AERO vs. AEROMamba-HQ | 0.0215 |
| AERO vs. AEROMamba-PAQM | 0.6028 |
| AERO vs. AEROMamba-PAQM-HQ | 0.6917 |
| AERO vs. AudioSR | < 0.0001 |
| AERO-HQ vs. AEROMamba | < 0.0001 |
| AERO-HQ vs. AEROMamba-HQ | 0.0002 |
| AERO-HQ vs. AEROMamba-PAQM | 0.2574 |
| AERO-HQ vs. AEROMamba-PAQM-HQ | 0.2083 |
| AERO-HQ vs. AudioSR | < 0.0001 |
| AEROMamba vs. AEROMamba-HQ | < 0.0001 |
| AEROMamba vs. AEROMamba-PAQM | < 0.0001 |
| AEROMamba vs. AEROMamba-PAQM-HQ | < 0.0001 |
| AEROMamba vs. AudioSR | < 0.0001 |
| AEROMamba-HQ vs. AEROMamba-PAQM | 0.1172 |
| AEROMamba-HQ vs. AEROMamba-PAQM-HQ | 0.0806 |
| AEROMamba-HQ vs. AudioSR | < 0.0001 |
| AEROMamba-PAQM vs. AEROMamba-PAQM-HQ | 0.8168 |
| AEROMamba-PAQM vs. AudioSR | < 0.0001 |
| AEROMamba-PAQM-HQ vs. AudioSR | < 0.0001 |
Audio Examples: MUSDB
Tracks upsampled from 11.025kHz to 44.1kHz
| Track | Original (Low-Res) 11.025 kHz |
Original (High-Res) 44.1 kHz |
AERO 11.025 → 44.1 kHz |
AEROMamba 11.025 → 44.1 kHz |
AEROMamba-PAQM 11.025 → 44.1 kHz |
|---|---|---|---|---|---|
| 459 | |||||
| 480 | |||||
| 826 | |||||
| 625 |
Results: Super-resolution of Heavily Compressed Audio
Objective and subjective scores for low-bitrate (MP3 32kbps) signals and various models evaluated on MUSDB and PianoEval.
MUSDB Results
| System | ViSQOL ↑ | LSD ↓ | Score ↑ |
|---|---|---|---|
| Low-Bitrate | 1.80 | 2.02 | 50.7 |
| AEROMamba | 2.45 | 1.24 | 49.8 |
| AEROMamba-PAQM | 2.99 | 1.27 | 49.7 |
| AEROMamba-PAQM++ | 2.90 | 1.23 | 75.6 |
PianoEval Results
| System | ViSQOL ↑ | LSD ↓ | Score ↑ |
|---|---|---|---|
| Low-Bitrate | 4.35 | 2.33 | 69.5 |
| AEROMamba | 4.22 | 1.14 | 83.4 |
| AEROMamba-PAQM | 4.24 | 1.12 | 84.1 |
| AEROMamba-PAQM++ | 4.41 | 1.13 | 85.5 |
Subjective Score Distributions
MUSDB
PianoEval
Statistical Tests (Mann-Whitney U)
Pairwise comparisons of subjective scores (p-values). Values < 0.05 are considered statistically significant.
MUSDB (Subjective)
| Comparison | p-value |
|---|---|
| Low Res vs. AEROMamba | 0.7390 |
| Low Res vs. AEROMamba-PAQM | 0.8233 |
| Low Res vs. AEROMamba-PAQM++ | < 0.0001 |
| AEROMamba vs. AEROMamba-PAQM | 0.8751 |
| AEROMamba vs. AEROMamba-PAQM++ | < 0.0001 |
| AEROMamba-PAQM vs. AEROMamba-PAQM++ | < 0.0001 |
PianoEval (Subjective)
| Comparison | p-value |
|---|---|
| Low Res vs. AEROMamba | < 0.0001 |
| Low Res vs. AEROMamba-PAQM | < 0.0001 |
| Low Res vs. AEROMamba-PAQM++ | < 0.0001 |
| AEROMamba vs. AEROMamba-PAQM | 0.9193 |
| AEROMamba vs. AEROMamba-PAQM++ | 0.7168 |
| AEROMamba-PAQM vs. AEROMamba-PAQM++ | 0.7034 |
Pairwise comparisons of ViSQOL scores (p-values).
MUSDB (ViSQOL)
| Comparison | p-value |
|---|---|
| Low Res vs. AEROMamba | < 0.0001 |
| Low Res vs. AEROMamba-PAQM | < 0.0001 |
| Low Res vs. AEROMamba-PAQM++ | < 0.0001 |
| AEROMamba vs. AEROMamba-PAQM | < 0.0001 |
| AEROMamba vs. AEROMamba-PAQM++ | < 0.0001 |
| AEROMamba-PAQM vs. AEROMamba-PAQM++ | < 0.0001 |
PianoEval (ViSQOL)
| Comparison | p-value |
|---|---|
| Low Res vs. AEROMamba | < 0.0001 |
| Low Res vs. AEROMamba-PAQM | < 0.0001 |
| Low Res vs. AEROMamba-PAQM++ | 0.0052 |
| AEROMamba vs. AEROMamba-PAQM | 0.3122 |
| AEROMamba vs. AEROMamba-PAQM++ | < 0.0001 |
| AEROMamba-PAQM vs. AEROMamba-PAQM++ | < 0.0001 |
Audio Examples: MUSDB
Tracks restored from 32kbps MP3 to 44.1kHz
| Track | Low-Bitrate MP3 32kbps |
High-Res 44.1 kHz |
AEROMamba | AEROMamba-PAQM | AEROMamba-PAQM++ |
|---|---|---|---|---|---|
| 459 | |||||
| 480 | |||||
| 826 | |||||
| 625 |
PianoEval Dataset Metadata
We collected the PianoEval data set, which consists of two parts. The first is composed of the 24 Preludes for Piano, op. 28, by Chopin performed by 33 pianists in 45 different recordings available on CD (Compact Disc), totaling approximately 22 hours. The second part contains excerpts of Ligeti piano études, a Schumann sonata, and the Barber sonata, played by three different performers, respectively, totaling approximately 3.5 hours. Each file is stored in WAV format, stereo mode and sampled at 44.1 kHz. Information about performers, record label and year of recording are detailed in the Tables below.
Train/Validation
| Pianist | Record label | Year |
|---|---|---|
| Arrau, C. | Columbia | 1950/1 |
| Arrau, C. | Philips | 1973 |
| Argerich, M. | Deutsche Grammophon | 1975 |
| Ashkenazy, V. | Decca | 1976 |
| Ashkenazy, V. | Decca | 1992 |
| Bolet, J. | RCA | 1974 |
| Blechacz, R. | Deutsche Grammophon | 2007 |
| Cherkassky, S. | ASV | 1968 |
| Cortot, A. | HMV | 1926 |
| Cortot, A. | HMV | 1933/4 |
| Cortot, A. | Gramophone | 1942 |
| Cortot, A. | Archipel [live] | 1955 |
| Cortot, A. | EMI | 1957 |
| Davidovich, B. | Decca | 1979 |
| de Larrocha, A. | Decca | 1974 |
| Duchable, F. | Erato | 1988 |
| Dutra, G. | Yellow Tail | 1997 |
| El Bacha, A. R. | Forlane | 1999 |
| François, S. | EMI | 1959 |
| Freire, N. | Columbia | 1970 |
| Harasiewicz, A. | Philips | 1963 |
| Katsaris, C. | Sony | 1992 |
| Kissin, Y. | RCA | 1999 |
| Lima, A. M. | Caras1 | 1981 |
| Lucchesini, A. | EMI | 19882 |
| Magaloff, N. | Philips | 1975 |
| Novaes, G. | Music and Arts [live] | 1949 |
| Ohlsson, G. | EMI | 1974 |
| Ohlsson, G. | Hyperion | 1989 |
| Perahia, M. | Columbia | 1975 |
| Petri, E. | Columbia | 1942 |
| Pires, M. | Erato | 1975 |
| Pires, M. | Deutsche Grammophon | 1992 |
| Pogorelich, I. | Deutsche Grammophon | 1989 |
| Pollini, M. | Deutsche Grammophon | 1974 |
| Pollini, M. | Deutsche Grammophon | 2011 |
| Proença, M. | Delphos | 1999 |
| Rubinstein, A. | RCA | 1946 |
| Switala, W. | NIFC | 2006/7 |
| Tiempo, S. | Victor | 1990 |
| Varsi, D. | Genuin | 1988 |
1 Refers to a magazine.
2 Refers to the release year, not the recording year.
Test
| Pianist | Record label | Year |
|---|---|---|
| B. Glemser | Naxos | 1993 |
| D. Pollack | Naxos | 1995 |
| P. L. Aimard | Sony | 1995 |
Subjective Test Tracklist
The following table maps the Question IDs (QID) used during the subjective listening tests to the corresponding audio tracks from the MUSDB and PianoEval datasets.
| QID | Track | QID | Track |
|---|---|---|---|
| 1 | electronic01 | 13 | electronic02 |
| 2 | rock01 | 14 | rock02 |
| 3 | pop01 | 15 | pop02 |
| 4 | hiphop01 | 16 | hiphop02 |
| 5 | latin01 | 17 | reggae01 |
| 6 | other01 | 18 | other02 |
| 7 | 02Barber | 19 | 04Barber |
| 8 | 14Ligeti | 20 | 17Ligeti |
| 9 | 05Ligeti | 21 | 15Ligeti |
| 10 | 07Barber | 22 | 08Barber |
| 11 | 03Schumann | 23 | 04Schumann |
| 12 | 02Schumann | 24 | 15Schumann |