Google’s Gemini forces contractors to evaluate AI responses that are outside their expertise

Google’s Gemini forces contractors to evaluate AI responses that are outside their expertise

Generative AI may seem magical, but behind the development of these systems are legions of employees from companies like Google, OpenAI and others, so-called “prompt engineers” and analysts who evaluate the accuracy of chatbot output to improve their AI.

But a new internal policy shared by Google with contractors working on Gemini, seen by TechCrunch, has raised concerns that Gemini could be more vulnerable to leaking inaccurate information to everyday people on highly sensitive topics like healthcare.

To improve Gemini, contractors working with GlobalLogic, an outsourcing company owned by Hitachi, are regularly asked to rate AI-generated answers based on factors such as “truthfulness.”

These contractors were, until recently, able to “skip” certain prompts, thereby opting out of evaluating various AI-written responses to those prompts if the prompt was well beyond their expertise. For example, a contractor might skip a prompt that asked a niche question about cardiology because the contractor had no scientific background.

But last week, GlobalLogic announced a change from Google that would no longer allow contractors to skip such prompts, regardless of their own expertise.

Internal correspondence seen by TechCrunch shows that the guidelines previously stated: “If you do not have critical expertise (e.g., coding, math) to evaluate this prompt, please skip this task.”

But now the guidelines say, “You should not skip prompts that require specific domain knowledge.” Instead, contractors are asked to “evaluate the portions of the prompt that you understand” and add a note that they do not have domain knowledge.

This has raised direct concerns about Gemini’s accuracy on certain topics, as contractors are sometimes tasked with evaluating highly technical AI responses on topics such as rare diseases that they have no experience with.

“I thought the point of skipping was to increase accuracy by giving it to someone better?” a contractor mentioned in internal correspondence, seen by TechCrunch.

Contractors can now only skip prompts in two cases: if they “completely lack information such as the full prompt or response” or if they contain harmful content that requires special consent forms to evaluate, according to the new guidelines.

Google did not respond to TechCrunch’s requests for comment as of press time. After publishing this story, Google, which did not dispute our reporting, told TechCrunch that the company is “continuously working to improve the factual accuracy of Gemini.”

“Assessors perform a wide range of tasks across many different Google products and platforms,” said Google spokeswoman Shira McNamara. “They not only review answers for content, but also provide valuable feedback on style, format and other factors. The ratings they provide do not directly impact our algorithms, but taken as a whole they are a useful data point to help us measure how well our systems are performing.”

Updated with post-publish comment from Google.

You can safely send tips to this reporter on Signal at 628-282-2811.

Leave a Reply

Your email address will not be published. Required fields are marked *