Skip to main content

13. Much of the training data is biased, harmful, or unsafe

You can’t fully trust the output of generative AI because of the massive size of their training data sets.

She feels like a goth, this one. A body of patterns and noise. Defined with lines. Circles provide the highlights.

Summary

The data sets used to train AI models are unevenly distributed, biased, unsafe, or vulnerable to manipulation. The effectiveness of the model depends on its size, and for many use cases, there simply isn’t enough data available legally. A well-resourced attacker can inject desirable results, introduce more biases, hide results, and introduce false positives. The behaviour of many AI systems can be manipulated at a relatively low cost. Discovering these attacks is impossible if the training data the vendors use remains undocumented.

To read this post you'll need to become a member. Members help us fund our work to ensure we can stick around long-term.

See our plans (Opens in a new window)

Topic Intelligence Illusion

0 comments

Would you like to be the first to write a comment?
Become a member of Out of the Software Crisis and start the conversation.
Become a member