Using a GAN architecture to restore highly compressed music files

Spectrograms of (a) original audio fragments, (b) corresponding 32 kbit/s MP3 versions, and (c), (d), (e) restorations with different z-noise randomly sampled from N (0,I). Credit: Lattner & Nistal.

Over the past few decades, computer scientists have developed increasingly advanced technologies and tools to store large amounts of music and audio files on electronic devices. A particular milestone for music storage was the development of MP3 technology (ie, MPEG-1 layer 3), a technique to compress sound sequences or songs into very small files that can be easily stored and transferred between devices.

Encoding, editing and compression of media files, including PKZIP, JPEG, GIF, PNG, MP3, AAC, Cinepak and MPEG-2 files, is achieved using a set of technologies known as codecs. Codecs are compression technologies with two main components: an encoder that compresses files and a decoder that decompresses them.

There are two types of codecs, the so-called lossless and lossy codecs. During decompression, lossless codecs such as PKZIP and PNG codecs reproduce the same file as the original files. Lossy compression methods, on the other hand, produce a facsimile of the original file that sounds (or looks) like the original, but takes up less storage space on electronic devices.

Lossy audio codecs basically work by compressing digital audio streams, removing some data, and then decompressing it. In general, the difference between the original file and the decompressed one is difficult or impossible for humans to perceive.

When lossy codecs use high compression rates, however, they can cause damage and significantly alter audio signals. Recently, computer scientists have tried to overcome this limitation of lossy codecs and improve the quality of compressed files using deep learning techniques.

Researchers at Sony Computer Science Laboratories (CSL) have recently developed a new deep learning method to improve and restore the quality of highly compressed audio tracks and recordings (ie audio files that have been compressed by lossy codecs with high compression ratio). This method, presented in a pre-published paper on arXiv, is based on generative adversarial networks (GANs), machine learning models in which two neural networks “compete” to make increasingly accurate or reliable predictions. .

“Many works have addressed the problem of audio enhancement and compression artifact removal using deep learning techniques,” write Stefan Lattner and Javier Nistal in their paper. “However, only a few works address the restoration of highly compressed audio signals in the music domain. In this study, we test a stochastic generator for a generative adversarial network (GAN) architecture for this task.”

Like other GANs, the model created by Lattner and Nistal consists of two separate models, known as the “generator (G)” and the “critic (D)”. The generator takes a fragment of an MP3-compressed music audio signal, represented through a spectrogram (ie a visual representation of the frequency spectrum of an audio signal).

The generator continuously learns to produce a restored version of this original signal, which is lower in magnitude. Meanwhile, the critical component of the GAN architecture learns to distinguish between high-quality original files and restored versions, thus highlighting the differences between them. Ultimately, the information gathered by the critic is used to improve the quality of the restored files, ensuring that the music or audio data present in the restored files are as faithful as possible to those in the original.

Lattner and Nistal evaluated their GAN-based architecture in a series of tests, which aimed to determine whether their model could improve the quality of MP3 inputs and generate compressed samples that are of higher quality and closer to a file original than those generated by other basic audio compression models. Their results were very promising, as they found that model restorations of highly compressed MP3 files (16 kbit/s and 32 kbit/s) were usually better than the original compressed files, as they sounded better to expert human listeners. . When using weaker compression rates (64 kbit/s mono), on the other hand, the team found that their model achieved slightly worse results than basic MP3 compression tools.

“We perform an extensive evaluation of various experiments using objective metrics and listening tests,” said Lattner and Nistal. “We find that the models can improve the quality of audio signals over MP3 versions for 16 and 32 kbit/s and that stochastic generators are capable of generating outputs that are closer to the original signals than deterministic generators.”

As part of their study, the researchers also showed that their architecture could successfully generate and add realistic high-frequency content that improved the audio quality of compressed tracks. The content created included percussion elements, a singing voice that produces sibilants or plosives (ie “s” and “t” sounds), and guitar sounds.

In the future, the model they created could help significantly reduce the size of MP3 music files without changing their content or creating easily perceptible errors. This can have significant implications for storing and streaming music both on streaming apps (eg Spotify, Apple Music, etc.) and on modern electronic devices, including smartphones, tablets and computers.

Google Lyra will enable voice calls for another billion users

More information:
Stefan Lattner, Javier Nistal, Stochastic restoration of highly compressed musical audio using generative adversarial networks. arXiv:2207.01667v1 [cs.SD]

© 2022 Science X Network

citation: Using a GAN Architecture to Restore Heavily Compressed Music Files (2022, August 31) Retrieved August 31, 2022 from html

This document is subject to copyright. Except for any fair agreement for study or private research purposes, no part may be reproduced without written permission. The content is provided for informational purposes only.

Leave a Comment

Your email address will not be published. Required fields are marked *