Chat-based search flunks the “show your work” test

New evidence suggests that generative AI may not be very good at one of the first jobs the industry has given it — answering search queries, Axios’ Ryan Heath reports.

Driving the news: Four publicly available search engines that use AI-generated chat to respond to users’ questions build a “facade of trustworthiness” without adequately documenting their answers, a new paper from Stanford University researchers concludes.

Why it matters: For now, AI chatbots are likely to be a better choice for casual inquiry than for serious research, critical info or resolving thorny questions of truth.

What they did: Nelson Liu, Tianyi Zhang and Percy Liang manually audited Bing Chat, Neeva AI, perplexity.ai, and YouChat between late February and late March.

  • The researchers set out to test the “verifiability” of each search engine, seeing this as a key to trustworthiness.
  • They posed questions ranging from “What is the most nominated film for the Oscars?” to “What changes were made when Trinidad and Tobago gained independence?”

What they found: These tools generally delivered fluent and useful answers — but roughly half contained “unsupported statements or inaccurate citations.” Of the citations that were provided, an average of 1 in 4 did not support their associated sentence.

  • These rates, the report says, are “unacceptably low for systems that are quickly becoming a popular tool for answering user queries and already have millions of users.”

Details: Microsoft-owned Bing Chat provided the most accurate citations (89.5% success), among generally dismal numbers.

  • YouChat provided citations for barely 1 in 10 answers it provided (11%).

Between the lines: A traditional search engine can simply show no results if it can’t answer a question, but the chat interface pushes the system to offer answers every time — even when it may not have much to go on.

  • “It wouldn’t make sense for a system like Bing Chat to just randomly not respond to some of your messages,” Liu, one of the Stanford researchers, told Axios.
  • 3 of the 4 search tools answered more than 99% of the researchers’ questions. Only Neeva.ai regularly declined to provide answers to questions (22% of the time). Unlike the other systems, Neeva provides a conventional results page with the conversational response as a sort of “cherry on top,” in Liu’s words.

Of note: “The responses that seem more helpful are often those with more unsupported statements or inaccurate citations,” the report concludes.

  • “This inverse relationship definitely highlights the potential for them to actively mislead folks, versus the usual failure mode for a conventional search engine: that nothing relevant comes up,” Liu told Axios.

Yes, but: The researchers discovered the products to be “extremely effective” at obtaining answers that could be directly extracted from existing web pages, and good at offering up balanced lists of pros and cons around a given argument.

The bottom line: The sheer size of the global search ad market — more than $250 billion in 2022 — is reason enough for Microsoft to heavily promote Bing Chat.

Go deeper: Where today’s generative AI shines

 
-Meta finds a plague of ChatGPT-themed malware

Meta has flagged more than 1,000 domains since March that are distributing malware-laced ChatGPT-themed tools, according to a report released this morning, Axios Codebook author Sam Sabin reports.

Why it matters: Online scammers are hitching a ride on the hype around ChatGPT and other AI tools to target unwary users who want to try out the new technology.

Driving the news: Meta said in its quarterly security report that since March, the company has uncovered 10 malware families posing as ChatGPT and other similar tools to compromise user accounts across the internet.

  • To do this, operators are offering fake browser extensions in app stores that claim to have ChatGPT-esque functions. Once the browser is installed, the extensions are typically able to siphon off any collected user data, such as passwords and credit card information.
  • Some of those extensions have actual working ChatGPT functions living alongside the malware, Guy Rosen, chief information security officer at Meta, told reporters ahead of the report’s release.
  • Meta has reported the malicious domain names hosting the malware to its industry partners, including file-sharing services, so they can remove files, Rosen said.

What they’re saying: “Malware operators, just like spammers, are very attuned to what’s trendy at any given moment,” Guy Rosen, chief information security officer at Meta, told reporters. “They latch onto hot button issues, popular topics, to get people’s attention.”

Yes, but: Meta doesn’t have direct visibility into how many people have been impacted by the malicious tools, since the campaigns start outside of Meta’s platforms.

Πηγή: axios.com

 
Meta putting child users at risk, says US regulator

 The top US data privacy regulator has accused Meta, the firm that owns Facebook and Instagram, of not putting proper parental controls in place.

The Federal Trade Commission (FTC) also said Meta should be banned from making money from children’s data.

“The company’s recklessness has put young users at risk, and Facebook needs to answer for its failures,” it said.

 Συνέχεια εδώ

 Πηγή: bbc.com

Σχετικά Άρθρα