A lawsuit claims Google has been 'secretly stealing everything ever created and shared on the internet by hundreds of millions of Americans' to train its AI

L4sBot@lemmy.world · 1 year ago

A lawsuit claims Google has been 'secretly stealing everything ever created and shared on the internet by hundreds of millions of Americans' to train its AI

popemichael@lemmy.world · 1 year ago

Indexing a site isn’t stealing from it.

Plus you can shut all that down with some simple HTML

fubo@lemmy.world · edit-2 1 year ago

If you own a web site and believe that it is “stealing” for AI bots to read your site’s content and learn from it, do you also believe that search engine indexing is “stealing”? Search engine indexing involves the search engine bot downloading all the public content of your site and building a model (the index) from it. That is how it’s possible for search engine users to find your site.

If you do believe search engine indexing is “stealing”, have you blocked Googlebot, Bingbot, BaiduSpider, DuckDuckBot, YandexBot, etc. in your robots.txt?

“Publishing” means making public.

If you write a book, you own the copyright to the book. But the fact that the text of your book contains a particular word, e.g. the word “mesothelioma”, is a public fact. You don’t own that fact.

A search engine for book content can read your book, and record the fact that it contains the word “mesothelioma” in its model; and then when someone searches for that word, it can return a link to your book.

Creating the index meant that the search engine internally made a copy of the text of your book. However, serving search results is not a copyright infringement; rather, it is stating the true fact that your book contains that word.

Similarly, if you write a book about how asbestos causes mesothelioma, that fact is not your property. If someone borrows your book from the library, reads it, and learns that fact, they do not owe you money. Even if they go around telling everyone about mesothelioma, they still do not owe you any money.

If they are an academic, the rules of academic publishing say that they are supposed to cite your work as a source — telling their readers that they learned something from your work. But if they don’t, that’s still not copyright infringement; it’s plagiarism, which is not a crime but rather an offense against academic honor.

plebeian_@lemmy.ml · 1 year ago

search engines point to your site though. You are getting back something. An LLM won‘t give a reference. It’s something else altogether.

And there is no „robots.txt“ to block LLM training scrapers.

Just because you publish something doesn’t imply you forfeit copyright.

Touching_Grass@lemmy.world · edit-2 1 year ago

Their work isn’t being reproduced and sold. Seems like fair use. I hate to say it but I’m with google on this. Things would get much with these lawsuits succeeding

AbidanYre@lemmy.world · 1 year ago

No, but it is being used commercially for a profit.

This seems like a situation copyright law never saw coming.

Flying Squid@lemmy.world · 1 year ago

This is the problem Sarah Silverman had and why she is joining in on a lawsuit. It’s not just that it trained on her book, it’s that if you ask it to do so, it will regurgitate passages from her book verbatim. That is why this is problematic.

SCB@lemmy.world · edit-2 1 year ago

That’s not really problematic since if anyone online asks me to quote one of many books I can copy-paste passages verbatim and it isn’t a copyright violation

Happens all the time in online communities dedicated to book discussion