Recently, an interesting article has been published on Wired discussing Model Collapse concerning generative AI. But what does it mean? Is it truly a problem? Let's find out together.
Generative AI is now extensively prevalent in the tasks of our lives, both professional and personal. Thousands of contents are produced daily using tools like Chat GPT and Midjourney, which are subsequently published online.
We also understand how generative AI works: initial datasets are arranged, from which AI learns to reproduce output data similar to the input. A considerable portion of the initial data is sourced from the web, and according to the study 'The curse of recursion: training on generated data makes models forget' by Ilia Shumailov, Zakhar Shumaylov, Yiren Zhao, Nicolas Papernot, and Ross Anderson, this could lead to Model Collapse.
The Model Collapse is defined as a degenerative process in which the content generated by LLM models (Large Language Models) will pollute the datasets for future generations of AI.
In simpler terms, model collapse occurs when future AIs are fed data generated by previous AIs. But what's the issue?
Even today, the content generated by AI isn't always flawless. This is partly because the input data from the web should be curated and chosen from a myriad of not always truthful content. Furthermore, AI tends to create statistically common responses, indicating a growing flattening of vocabulary and content, eliminating exceptions and special cases.
If future AI is fed with these output data that already have issues regarding biases and errors, the future output data will only increase the redundancy of these problems.
It's crucial to pay attention to this issue, especially considering that AI is increasingly integrating into our lives, think of its implementations in the assistants on our smartphones and personal computers.
We'll need to find solutions that measure up to these challenges, especially because AI has just arrived in our lives and will likely remain with us for a long time.
Comments