An example of model collapse in an image-generating AI. (FreeThink,“Model collapse” threatens to kill progress on generative AIs)
The images introduce how the AI. That surrounds the same data and causes corruption in the data's entirety. When data travels in wires, there is always some kind of disturbance. Those electromagnetic and physical anomalies cause a lost bit or two in the data that travels in the wire. The loss of one or two bits doesn't mean much. When the data surrounds the system thousands of times. That causes a very bad change in the data flow. And sooner or later, the data mass is corrupted and turned into another one.
The corruption in data is the same thing as the model collapse in the data flow.
When we use large language models LLM's we use computer programs that can search and connect data by using certain algorithms. That algorithm searches for details from search and then connects data from sources that fill certain rules. That makes it possible for the LLM can make almost suitable academic texts and translate homepages. The problem with the LLM is that it doesn't think.
The LLM sees letters as ASCII codes and searches for similarities in the ASCII codes that it sees in homepages. So the LLM doesn't know the meaning of the words that it uses to generate new texts. The LLM collects words that look like the same ones that are in the search field. And then the AI connects those words into the new entirety.
In that model, the LLM starts to surround the data. That means the LLM recycles texts that it made itself. And that thing forms the corruption in the code, that the LLM recycles. When LLM starts to use its texts and images as sources, that can cause problems with data, that the LLM uses.
One of the biggest problems is that the AI can start to recycle its texts. Or the AI can interconnect the data, that other AI's generate. And when the AI generates data, it just connects sources to the new entireties.
The AI doesn't know what words mean. And that makes it easy to manipulate its answers changing the involvements of those homepages.
The problem is this. When the AI selects sources, it uses the homepages from the search engine. The search engine doesn't make a difference does the click come from the LLM or the human? And the LLM can raise some homepages pagerank. It's possible. Some homepages involve cool-looking information but having some kind of scientific-looking heading or title in the search path can raise the homepage that has nothing to do with the thing, that maybe the student searches.
And that makes the AI give text, there is nothing to do with studies. In the worst case, the person who does not know the searched topics puts that AI-created crab on Wikipedia or something like that. In the worst case, the AI starts to recycle that data. And in some other cases, the AI starts to connect data, that other AI's created. In those cases the data that AI connects and rewrites starts to corrupt.
https://www.freethink.com/robots-ai/model-collapse-synthetic-data
Comments
Post a Comment