TL;DR
- Google has recently revised how it instructs contractors to evaluate AI responses.
- Reviewers are now less able to decline to offer feedback because they lack specific expertise in a topic.
- Google defends its interest in this data, pointing to the wide array of factors that shape the feedback it’s looking for.
Whenever we’re talking about controversies surrounding AI, the “human element” often appears as a counter-argument. Worried about AI taking your job? Well someone’s still got to code the AI, and administer the dataset that trains the AI, and analyze its output to make sure it’s not spouting complete nonsense, right? Problem is, that human oversight only goes as far as the companies behind these AI models are interested in taking it, and a new report raises some concerning questions about where that line is for Google and Gemini.
Google outsources some of the work on improving Gemini to companies like GlobalLogic, as outlined by TechCrunch. One of the things it does is ask reviewers to evaluate the quality of Gemini responses, and historically, that’s included directions to skip questions that are outside the reviewer’s knowledge base: “If you do not have critical expertise (e.g. coding, math) to rate this prompt, please skip this task.”
That seems like a pretty reasonable guideline on its face, helping to minimize the impact non-experts might have on steering AI responses in the wrong direction. But as TechCrunch found out, that’s recently changed, and the new rules GlobalLogic is sharing with its contributors direct them to “not skip prompts that require specialized domain knowledge” and go ahead and at least “rate the parts of the prompt you understand.” They’re at least asked to enter a note in the system that the rating is being made in spite of their lack of knowledge.
While there’s a lot worth evaluating about an AI’s responses beyond just “is this very technical information accurate, complete, and relevant,” it’s easy to see why a policy change like this could be cause for concern — at the very least, it feels like lowering standards in an effort to process more data. Some of the people tasked with evaluating this data apparently shared those very same concerns, according to internal chats.
Google offered TechCrunch this explanation , from spokesperson Shira McNamara:
Raters perform a wide range of tasks across many different Google products and platforms. They do not solely review answers for content, they also provide valuable feedback on style, format, and other factors. The ratings they provide do not directly impact our algorithms, but when taken in aggregate, are a helpful data point to help us measure how well our systems are working.
That largely matches our read on what felt like was going on here, but we’re not sure it will be sufficient to assuage all doubts from the AI-skeptical public. With human oversight so critical in reining in undesirable AI behavior, any suggestion that standards are being lowered is only going to be met with concern.
Got a tip? Talk to us! Email our staff at [email protected]. You can stay anonymous or get credit for the info, it's your choice.