If you think FLAC is the audiophile's friend when it comes to lossless music files, a large language model (LLM) has news for you, as it's now laying claim to compression as part of AI's growing realm of influence, too.
A study titled "Language Modeling Is Compression" (via ArsTechnica) discusses a finding about an LLM by DeepMind called Chinchilla 70B and its ability to perform lossless data compression better than FLAC for audio and PNG for pictures.
Chinchilla 70B could significantly shrink the size of image patches from the ImageNet database, reducing them to only 43.4% of their original size without losing any detail. This performance is better than the PNG algorithm, which could only reduce the image sizes to 58.5%.
Additionally, Chinchilla compresses audio data from the LibriSpeech to just 16.4% of their actual size for sound files. This is impressive, especially compared to the FLAC compression, which could only reduce the audio sizes to 30.3%.
Lossless compression means nothing is lost or left out when data is squeezed into smaller packages. This differs from lossy compression, which is what the image compression format JPEG uses. That removes some data and then guesses at what it should look like when you open the file again, all to make the file size that much smaller.
The study's findings show that even though Chinchilla 70B was mostly made to work with text, it is also surprisingly adept at making other types of data much smaller. And is often better at it than programs specifically made to do so.
Researchers of the study suggest that predicting and compressing data go both ways. This means if you have a good tool for making data smaller, like gzip, you can also use it to create new information based on