What role should experts play in assessing twins?

TL;DR

Google recently overhauled how it instructs contractors to evaluate AI responses.
Reviewers now have less ability to reject feedback because they lack specific expertise on a topic.
Google defends its interest in this data, pointing out the multitude of factors that influence the feedback sought.

When it comes to controversies surrounding AI, the “human element” often emerges as a counterargument. Are you worried that AI will take your job? Well, someone still has to program the AI and manage the data set that trains the AI and analyze its output to make sure it’s not spewing complete nonsense, right? The problem is that human control only goes as far as the companies behind these AI models are interested in it, and a new report raises some troubling questions about where that line is for Google and Gemini.

Google outsources some of the work to improve Gemini to companies like GlobalLogic, as described by TechCrunch. Among other things, it asks examiners to rate the quality of Gemini answers, and in the past this has included instructions for skipping questions that are outside the examiner’s knowledge base: “If you do not have critical subject knowledge (e.g. Coding, Mathematics) for assessment Please skip this task.”

At first glance, this seems like a pretty sensible guideline that helps minimize the impact that non-experts could have if they misdirect AI responses. But how TechCrunch I found out that this has recently changed, and the new rules that GlobalLogic tells its contributors instruct them to “not skip any prompts that require specific domain knowledge” and to go ahead and at least “evaluate the parts of the prompt that that you understand.” At the very least, they will be asked to enter a note in the system that the assessment will be made despite their ignorance.

While there is a lot of value in evaluating an AI’s responses beyond whether this very technical information is accurate, complete, and relevant, it’s easy to see why such a policy change might be cause for concern – or so it feels so lowering standards to process more data. Some of the people tasked with analyzing this data apparently shared the same concerns, according to internal chats.

Google offered TechCrunch this statement from spokesperson Shira McNamara:

Reviewers perform a variety of tasks across many different Google products and platforms. They not only review answers for content, but also provide valuable feedback on style, format, and other factors. The ratings they provide do not directly impact our algorithms, but taken as a whole they are a useful data point to help us measure how well our systems are performing.

That’s largely in line with our assessment of what felt like it was going on here, but we’re not sure it’ll be enough to dispel any doubts among the AI-skeptical public. Because human oversight is critical to curbing unwanted AI behavior, any suggestion to lower standards will only be met with concern.

Do you have a tip? Talk to us! Email our staff at [email protected]. You can remain anonymous or receive credit for the information, the choice is yours.

Breaking News

Dellupodisabato

What role should experts play in assessing twins?

Leave a Reply Cancel reply