13. Much of the training data is biased, harmful, or unsafe
You can’t fully trust the output of generative AI because of the massive size of their training data sets.
Summary
The data sets used to train AI models are unevenly distributed, biased, unsafe, or vulnerable to manipulation. The effectiveness of the model depends on its size, and for many use cases, there simply isn’t enough data available legally. A well-resourced attacker can inject desirable results, introduce more biases, hide results, and introduce false positives. The behaviour of many AI systems can be manipulated at a relatively low cost. Discovering these attacks is impossible if the training data the vendors use remains undocumented.
To read this post you'll need to become a member. Members help us fund our work to ensure we can stick around long-term.
See our plans (S'ouvre dans une nouvelle fenêtre)
Déjà membre ? Connexion (S'ouvre dans une nouvelle fenêtre)