13. Much of the training data is biased, harmful, or unsafe

0 high-fives

00 comments

You can’t fully trust the output of generative AI because of the massive size of their training data sets.

She feels like a goth, this one. A body of patterns and noise. Defined with lines. Circles provide the highlights.

Summary

The data sets used to train AI models are unevenly distributed, biased, unsafe, or vulnerable to manipulation. The effectiveness of the model depends on its size, and for many use cases, there simply isn’t enough data available legally. A well-resourced attacker can inject desirable results, introduce more biases, hide results, and introduce false positives. The behaviour of many AI systems can be manipulated at a relatively low cost. Discovering these attacks is impossible if the training data the vendors use remains undocumented.

To read this post you'll need to become a member. Members help us fund our work to ensure we can stick around long-term.

See our plans (Opens in a new window)

Already a member? Log in (Opens in a new window)

Date June 25, 2024

Topic Intelligence Illusion

0 high-fives

00 comments