Evaluating AI Bias Part 2 - Human Generated Bias in AI
Once again, I wrote about AI for the Abundance Institute. In this article, which I'm posting in full here, I explore the pressures and incentives on AI companies and how our choices now might influence what we are able to do with AI in the future. Thanks for reading, as always.
This is Part 2 of a series on human bias and AI. You can read Part 1 here.
Politicians and regulatory agencies are eager to influence AI model development. Just recently, a group of five senators sent a letter to OpenAI which included a request to make their foundation AI model available to the government before new releases. Additionally, California proposed a bill that has since been vetoed. The bill required companies to make a “positive safety determination” about a litany of potential threats that have not materialized, including “...enabling the creation and the proliferation of weapons of mass destruction, such as biological, chemical, and nuclear weapons, as well as weapons with cyber-offensive capabilities.” Although California’s bill did not get enacted, the approach taken by the bill’s sponsors may persist. Such actions and regulation puts AI companies in a precarious position. They want to move quickly and develop products for consumers, but they also want to placate regulators and stakeholders who are wary of AI’s purported capability to do harm. These incentives could drive these companies to make suboptimal modeling decisions.
People worry about AI models because they fear that we might not understand how they get things wrong. But we have methods to deal with this, as I discussed in my previous article on existing methods to measure and evaluate bias in AI models.
Yet what if the bias is injected purposely? Earlier this year, users found Google’s Gemini generating historically inaccurate images, ostensibly to make the results more equitable. Surely the underlying data had no such images, so Google was building corrective measures into the algorithm. If these measures had been less ham-handed, such human injected bias would have gone unnoticed.
Injected Bias and Motivations
We can use the same statistical methods outlined in my previous article to discover any kind of bias, whether it is bias intentionally injected by the programmers of the AI or if it is some other form of algorithmic bias. If two conditions are present, some bias is found to exist in the model outputs and the bias does not exist in the underlying data, we can surmise that the bias may have been injected by the programmers. But why would a programmer inject bias in the first place? Sometimes it’s a means to technically correct bias in the algorithm itself, but in the nascent generative AI space, often companies are responding to other incentives.
AI developers today (particularly the developers of generative AI like Google’s Gemini, OpenAI’s ChatGPT and Anthropic’s Claude), find themselves in a precarious situation, where their models are under intense scrutiny from various interest groups. This means there is immense pressure to try to achieve two goals that are potentially in conflict. The first is to develop the best AI model while accounting for and correcting the statistical biases they find in their data and algorithms. The second is to avoid the appearance of bias that might come from outlier examples generated from user prompts.
AI companies have “red teams” that try to account for all of the possible ways their technology will be misused, but it is impossible to account for every possible prompt or sequence of prompts that may result in an incorrect or biased response. This is exacerbated by the fact that bad actors will not document the entire process that generated the adverse result, they will only show the terminal prompt. Interest groups and journalists can latch onto these cases and create a narrative that models are biased when in fact no statistical bias exists in these models. Such narratives can directly impact the business by lowering revenue, hurting the stock price, or damaging the brand.
In order to heed off criticism, companies in the AI space are increasingly optimizing to avoid the appearance of bias. This sets a trap for AI developers. This hazard is most evident in image generation. For example, OpenAI attempts to keep users from specifying ethnicities or phenotypes for AI generated images by telling the user that the AI will not follow such instructions. Google’s Gemini attempted to solve the problem by not only pushing users not to specify ethnicities and phenotypes, but also by injecting specific ethnicities and phenotypes in the results based on rules not found in the training data. The result is a model that creates historically incorrect images. The pressure to take such anti-bias actions is only increasing, as states like California and Colorado introduce and aim to enact laws to outlaw algorithmic bias.
A Better Path
The current AI environment is not handling proper modeling and bias questions well. AI companies face external pressures to generate acceptable results, which can differ from accurate results. The companies’ have their own internal desire to limit negative narratives about AI products. These pressures have caused frontier AI companies to focus on removing the appearance of bias rather than scientifically approaching the bias question. There is also a possibility that companies relish their ability to inject bias in their products to shape the information that users receive.
There is a better way. These problems can be mitigated by using a study and review approach to AI model examination. This approach is separate from the current model capabilities tests, like MMLU and GPQA, which evaluate the model’s ability to “reason”, but instead open the model for study by statisticians and other researchers who are interested specifically in learning how the model may generate biased results in the context of a normal user experience. A study and review approach would involve AI developers opening their models to a broad set of researchers who might test the models’ ability to generate accurate outputs from a wide range of perspectives. Hopefully, companies could use this approach proactively to avoid regulation, but it’s possible that light touch regulation could force companies to open their models for research without imposing restrictions on the models themselves.
A preemptive regulatory regime, like the one proposed in California, would entrench the current set of AI companies as incumbents and stymie competition, whereas a study and review approach would not only force incumbents to approach the bias question more scientifically, but would allow for new AI products that better fit needs. Perhaps Google does not need to be in the business of AI image generation, especially if they insist on injecting bias. A study and review regime, where Google’s AI bias is evaluated scientifically by researchers with access to the source code, would enable new AI companies to fill the gap left by Google’s biased image generator and improve on the work.
A study and review regime is ideal for and welcoming to the development of open source models, like Meta’s LLAMA models. Open source models make models available publicly for use in product development and for use by researchers and academics to evaluate how the model works. This serves two purposes. It allows anyone to observe how these models might be applied downstream in the market to create solutions for businesses and consumers, and it also allows researchers to publish findings about how the model works. This research can evaluate modeling capabilities, including the bias in the models. Importantly, this enables a wide range of researchers to evaluate the model.
Ideally companies would preemptively submit their models for study and review to a wide range of researchers. Companies understandably might be protective of trade secrets where they believe they have a competitive advantage, and they might be wary of giving access to researchers who might make negative findings about their model and earn them a cycle of bad press. But, this could head off more intrusive government regulation and ensure that the likely inevitable future regulation is far more targeted and well-thought than the approach in California. Companies with open-source models are already opening their models to this kind of research, whereas companies with closed models would have to voluntarily provide researchers with access to the models.
The alternative is a punitively regulated regime, and in such a regime AI companies would most likely operate closed models and cater those models specifically to the research questions and methods used by the regulators or few selected researchers who would have access to the model. This would encourage companies to inject bias into their models in order to produce outcomes that satisfy the research questions and experiments posed by a narrow set of researchers and regulators, generating worse products for consumers while curbing the ability of new companies to create competing models that might use data differently or generate better market solutions.
In fact, any mandate that makes models subject to general safety or bias review would encourage companies to avoid developing open source models, because creating open source models would open up the risk that outside researchers would design research questions and experiments that may demonstrate that the model violates regulations, subjecting their models to additional litigation risks. This would ultimately reduce the amount of quality research that could be done to innovate and improve on these AI models. Avoiding punitive regulation would encourage companies to use more open source methods, to improve both consumer trust in the model and overall model safety, and ultimately generate better AI modeling solutions for consumers.
Conclusion
Too often regulators, AI safety organizations, and political actors frame AI in sci-fi terms, envisioning the potential for AI to create dystopian technology used for nefarious purposes. While I recognize that there are other concerns about foundational models and their potential risks, it is better to think of the capability in general the same way we think of human beings. There is a future where there are many kinds of AI, built to do specific jobs, just as humans do specific jobs. Regulation that attempts to preemptively stop AI from generating “dangerous” outputs curbs our ability to purpose-build AI for good purposes in necessary fields.
Companies can improve the trust in their products, and improve their negotiation position with regulators by proactively opening their models up to researchers. Further, companies can be open and transparent about the results of research into their models and the actions they might take in response to research findings.
We are still in the very early stages of AI innovation, and the decisions we make now will shape how innovation moves in the future. As I wrote in Part 1, we have the tools to evaluate these models for bias, and by extension their ability to effectively automate and solve many real world problems. We should focus on improving the school of research around these models by encouraging companies to partner with academics who can ask a variety of important experimental questions. Then, when we are operating from a better set of research findings, we can determine where narrow, targeted regulation might make sense.