On Wednesday, Google previewed what could be one of the largest changes to the search engine in its history.
Google will use AI models to combine and summarize information from around the web in response to search queries, a product it calls Search Generative Experience.
Instead of “ten blue links,” the phrase that describes Google’s usual search results, Google will show some users paragraphs of AI-generated text and a handful of links at the top of the results page.
The new AI-based search is being tested for a select group of users and isn’t widely available yet. But website publishers are already worried if it becomes Google’s default way of presenting search results, it could hurt them by sending fewer visitors to their sites and keeping them on Google.com.
The controversy highlights a long-running tension between Google and the websites it indexes, with a new artificial intelligence twist. Publishers have long worried Google repurposes their verbatim content in snippets on its own website, but now Google is using advanced machine learning models that scrape large parts of the web to “train” the software to spit out human-like text and responses.
Rutledge Daugette, CEO of TechRaptor, a site focusing on gaming news and reviews, said Google’s move was made without considering the interests of publishers and Google’s AI amounts to lifting content.
“Their focus is on zero-click searches that use information from publishers and writers who spend time and effort creating quality content, without offering any benefit other than the potential of a click,” Daugette told CNBC. “Thus far, AI has been quick to reuse others’ information with zero benefit to them, and in cases like Google, Bard doesn’t even offer attribution as to where the information it’s using came from.”
Luther Lowe, a longtime Google critic and chief of public policy at Yelp, said Google’s update is part of a decades-long strategy to keep users on the site for longer, instead of sending them to the sites that originally hosted the information.
“The exclusionary self-preferencing of Google’s ChatGPT clone into search is the final chapter of bloodletting the web,” Lowe told CNBC.
According to Search Engine Land, a news website that closely tracks changes to Google’s search engine, the AI-generated results are displayed above the organic search results in testing so far. CNBC previously reported Google’s plans to redesign its results page to promote generated AI content.
SGE comes in a differently colored box — green in the example — and includes boxed links to three websites on the right side. In Google’s primary example, all three of the website headlines were cut off.
Google says the information isn’t taken from the websites, but is instead corroborated by the links. Search Engine Land said the SGE approach was an improvement and a “healthier” way to link than Google’s Bard chatbot, which rarely linked to publisher websites.
Some publishers are wondering if they can prevent AI firms such as Google from scraping their content to train their models. Companies such as the firm behind Stable Diffusion are already facing lawsuits from data owners, but the right to scrape web data for AI remains an undecided frontier. Other companies, such as Reddit, have announced plans to charge for access to their data.
Leading the charge in the publishing world is Barry Diller, Chairman of IAC, which owns websites including All Recipes, People Magazine and The Daily Beast.
“If all the world’s information is able to be sucked up into this maw and then essentially repackaged in declarative sentences, in what’s called chat, but it isn’t chat — as many grafs as you want, 25 on any subject — there will be no publishing, because it will be impossible,” Diller said last month at a conference.
“What you have to do is get the industry to say you cannot scrape our content until you work out systems where the publisher gets some avenue toward payment,” Diller continued, saying that Google will face this problem.
Diller says he believes publishers can sue AI firms under copyright law and current “fair use” restrictions need to be redefined. The Financial Times reported Wednesday Diller is leading a group of publishers “that is going to say we are going to change copyright law if necessary.” An IAC spokesperson declined to a request to make Diller available for an interview.
One challenge facing publishers is confirming their content is being used by AI. Google did not reveal training sources for its large language model that underpins SGE PaLM 2, and Daugette says while he’s seen examples of quotes and review scores from competitors repurposed on Bard without attribution, it’s hard to tell when the information is from his site without directly linked sources.
A Google spokesperson said that the company didn’t have plans to share about compensating publishers for training data.
“We’re introducing this new generative AI experience as an experiment in Search Labs to help us iterate and improve, while incorporating feedback from users and other stakeholders,” Google said in a statement.
“PaLM 2 is trained on a wide range of openly available data on the internet and we obviously value the health of the web ecosystem. And that’s really part of the way we think about how we build our products, to ensure that we have a healthy ecosystem where creators are a part of that thriving ecosystem,” Google VP of Research Zoubin Ghahramani said in a media briefing earlier this week.
Daugette says Google’s moves make being an independent publisher tough.
“I think it’s really frustrating for our industry to have to worry about our hard work being taken, when so many colleagues are being laid off,” Daugette said. “It’s just not okay.”
— CNBC’s Jordan Novet contributed reporting.