Thousands of authors demand payment from AI companies for use of copyrighted works::Thousands of published authors are requesting payment from tech companies for the use of their copyrighted works in training artificial intelligence tools, marking the latest intellectual property critique to target AI development.

  • cerevant@lemmy.world
    link
    fedilink
    English
    arrow-up
    77
    arrow-down
    41
    ·
    1 year ago

    There is already a business model for compensating authors: it is called buying the book. If the AI trainers are pirating books, then yeah - sue them.

    There are plagiarism and copyright laws to protect the output of these tools: if the output is infringing, then sue them. However, if the output of an AI would not be considered infringing for a human, then it isn’t infringement.

    When you sell a book, you don’t get to control how that book is used. You can’t tell me that I can’t quote your book (within fair use restrictions). You can’t tell me that I can’t refer to your book in a blog post. You can’t dictate who may and may not read a book. You can’t tell me that I can’t give a book to a friend. Or an enemy. Or an anarchist.

    Folks, this isn’t a new problem, and it doesn’t need new laws.

    • Dark Arc@lemmy.world
      link
      fedilink
      English
      arrow-up
      65
      arrow-down
      11
      ·
      1 year ago

      It’s 100% a new problem. There’s established precedent for things costing different amounts depending on their intended use.

      For example, buying a consumer copy of song doesn’t give you the right to play that song in a stadium or a restaurant.

      Training an entire AI to make potentially an infinite number of derived works from your work is 100% worthy of requiring a special agreement. This even goes beyond simple payment to consent; a climate expert might not want their work in an AI which might severely mischatacterize the conclusions, or might want to require that certain queries are regularly checked by a human, etc

        • DandomRude@lemmy.world
          link
          fedilink
          English
          arrow-up
          12
          arrow-down
          1
          ·
          1 year ago

          OpenAI and such being forced to pay a share seems far from the worst scenario I can imagine. I think it would be much worse if artists, writers, scientists, open source developers and so on were forced to stop making their works freely available because they don’t want their creations to be used by others for commercial purposes. That could really mean that large parts of humanity would be cut off from knowledge.

          I can well imagine copyleft gaining importance in this context. But this form of licencing seems pretty worthless to me if you don’t have the time or resources to sue for your rights - or even to deal with the various forms of licencing you need to know about to do so.

          • kklusz@lemmy.world
            link
            fedilink
            English
            arrow-up
            5
            arrow-down
            4
            ·
            1 year ago

            I think it would be much worse if artists, writers, scientists, open source developers and so on were forced to stop making their works freely available because they don’t want their creations to be used by others for commercial purposes.

            None of them are forced to stop making their works freely available. If they want to voluntarily stop making their works freely available to prevent commercial interests from using them, that’s on them.

            Besides, that’s not so bad to me. The rest of us who want to share with humanity will keep sharing with humanity. The worst case imo is that artists, writers, scientists, and open source developers cannot take full advantage of the latest advancements in tech to make more and better art, writing, science, and software. We cannot let humanity’s creative potential be held hostage by anyone.

            That could really mean that large parts of humanity would be cut off from knowledge.

            On the contrary, AI is making knowledge more accessible than ever before to large parts of humanity. The only comparible other technologies that have done this in recent times are the internet and search engines. Thank goodness the internet enables piracy that allows anyone to download troves of ebooks for free. I look forward to AI doing the same on an even greater scale.

            • Flying Squid@lemmy.world
              link
              fedilink
              English
              arrow-up
              8
              arrow-down
              2
              ·
              1 year ago

              Shouldn’t there be a way to freely share your works without having to expect an AI to train on them and then be able to spit them back out elsewhere without attribution?

              • kklusz@lemmy.world
                link
                fedilink
                English
                arrow-up
                1
                ·
                1 year ago

                No, there shouldn’t because that would imply restricting what I can do with the information I have access to. I am in favor of maintaining the sort of unrestricted general computing that we already have access to.

            • CmdrShepard@lemmy.one
              link
              fedilink
              English
              arrow-up
              1
              arrow-down
              1
              ·
              1 year ago

              The rest of us who want to share with humanity will keep sharing with humanity. The worst case imo is that artists, writers, scientists, and open source developers cannot take full advantage of the latest advancements in tech to make more and better art, writing, science, and software. We cannot let humanity’s creative potential be held hostage by anyone.

              You’re not talking about sharing it with humanity, you’re talking about feeding it into an AI. How is this holding back the creative potential of humanity? Again, you’re talking about feeding and training a computer with this material.

        • CmdrShepard@lemmy.one
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          2
          ·
          1 year ago

          Even the most reasonable law in the world can’t be enforced on someone who broke it 6 months before it was legislated.

          Sure it can. Just because it is a new law doesn’t mean they get to continue benefiting from IP ‘theft’ forever into the future.

          Imagine the FOMO and user frustration when ToS & legislation catch up and now ChatGPT has no access to the latest books, music, news, research, everything. Just stuff from before authors knew to include the “hands off” clause

          How is this an issue for the IP holders? Just because you build something cool or useful doesn’t mean you get a pass to do what you want.

          basically like the knowledge cutoff, but forever. It’s untenable,

          Untenable for ChatGPT maybe, but it’s not as if it’s the end of ‘knowledge’ or the end of AI. It’s just a single company product.

      • bouncing@partizle.com
        link
        fedilink
        English
        arrow-up
        6
        arrow-down
        6
        ·
        1 year ago

        The thing is, copyright isn’t really well-suited to the task, because copyright concerns itself with who gets to, well, make copies. Training an AI model isn’t really making a copy of that work. It’s transformative.

        Should there be some kind of new model of renumeration for creators? Probably. But it should be a compulsory licensing model.

        • jecxjo@midwest.social
          link
          fedilink
          English
          arrow-up
          5
          arrow-down
          1
          ·
          1 year ago

          The slippery slope here is that we are currently considering humans and computers to be different because (something someone needs to actually define). If you say “AI read my book and output a similar story, you owe me money” then how is that different from “Joe read my book and wrote a similar story, you owe me money.” We have laws already that deal with this but honestly how many books and movies aren’t just remakes of Romeo and Juliet or Taming of the Shrew?!?

          • bouncing@partizle.com
            link
            fedilink
            English
            arrow-up
            4
            arrow-down
            2
            ·
            1 year ago

            If you say “AI read my book and output a similar story, you owe me money” then how is that different from “Joe read my book and wrote a similar story, you owe me money.”

            You’re bounded by the limits of your flesh. AI is not. The $12 you spent buying a book at Barns & Noble was based on the economy of scarcity that your human abilities constrain you to.

            It’s hard to say that the value proposition is the same for human vs AI.

            • jecxjo@midwest.social
              link
              fedilink
              English
              arrow-up
              2
              arrow-down
              1
              ·
              1 year ago

              We are making an assumption that humans do “human things”. If i wrote a derivative work of your $12 book, does it matter that the way i wrote it was to use a pen and paper and create a statistical analysis of your work and find the “next best word” until i had a story? Sure my book took 30 years to write but if i followed the same math as an AI would that matter?

              • BartsBigBugBag@lemmy.tf
                link
                fedilink
                English
                arrow-up
                0
                ·
                1 year ago

                It’s not even looking for the next best word. It’s looking for the next best token. It doesn’t know what words are. It reads tokens.

                • jecxjo@midwest.social
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  arrow-down
                  1
                  ·
                  1 year ago

                  Good point.

                  I could easily see laws created where they blanket outlaw computer generated output derived from other human created data sets and sudden medical and technical advancements stop because the laws were written by people who don’t understand what is going on.

              • bouncing@partizle.com
                link
                fedilink
                English
                arrow-up
                2
                arrow-down
                2
                ·
                1 year ago

                It wouldn’t matter, because derivative works require permission. But I don’t think anyone’s really made a compelling case that OpenAI is actually making directly derivative work.

                The stronger argument is that LLM’s are making transformational work, which is normally fair use, but should still require some form of compensation given the scale of it.

                • jecxjo@midwest.social
                  link
                  fedilink
                  English
                  arrow-up
                  2
                  arrow-down
                  1
                  ·
                  1 year ago

                  But no one is complaining about publishing derived work. The issue is that “the robot brain has full copies of my text and anything it creates ‘cannot be transformative’”. This doesn’t make sense to me because my brain made a copy of your book too, its just really lossy.

                  I think right now we have definitions for the types of works that only loosely fit human actions mostly because we make poor assumptions of how the human brain works. We often look at intent as a guide which doesn’t always work in an AI scenario.

                  • bouncing@partizle.com
                    link
                    fedilink
                    English
                    arrow-up
                    1
                    ·
                    1 year ago

                    Yeah, that’s basically it.

                    But I think what’s getting overlooked in this conversation is that it probably doesn’t matter whether it’s AI or not. Either new content is derivative or it isn’t. That’s true whether you wrote it or an AI wrote it.

        • Fedizen@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          3
          ·
          1 year ago

          Challenge level impossible: try uploading something long to amazon written by chatgpt without triggering the plagiarism detector.

      • cerevant@lemmy.world
        link
        fedilink
        English
        arrow-up
        7
        arrow-down
        8
        ·
        1 year ago

        My point is that the restrictions can’t go on the input, it has to go on the output - and we already have laws that govern such derivative works (or reuse / rebroadcast).

    • scarabic@lemmy.world
      link
      fedilink
      English
      arrow-up
      43
      arrow-down
      19
      ·
      1 year ago

      When you sell a book, you don’t get to control how that book is used.

      This is demonstrably wrong. You cannot buy a book, and then go use it to print your own copies for sale. You cannot use it as a script for a commercial movie. You cannot go publish a sequel to it.

      Now please just try to tell me that AI training is specifically covered by fair use and satire case law. Spoiler: you can’t.

      This is a novel (pun intended) problem space and deserves to be discussed and decided, like everything else. So yeah, your cavalier dismissal is cavalierly dismissed.

      • cerevant@lemmy.world
        link
        fedilink
        English
        arrow-up
        10
        arrow-down
        3
        ·
        1 year ago

        No, you misunderstand. Yes, they can control how the content in the book is used - that’s what copyright is. But they can’t control what I do with the book - I can read it, I can burn it, I can memorize it, I can throw it up on my roof.

        My argument is that the is nothing wrong with training an AI with a book - that’s input for the AI, and that is indistinguishable from a human reading it.

        Now what the AI does with the content - if it plagiarizes, violates fair use, plagiarizes- that’s a problem, but those problems are already covered by copyright laws. They have no more business saying what can or cannot be input into an AI than they can restrict what I can read (and learn from). They can absolutely enforce their copyright on the output of the AI just like they can if I print copies of their book.

        My objection is strictly on the input side, and the output is already restricted.

        • Redtitwhore@lemmy.world
          link
          fedilink
          English
          arrow-up
          5
          arrow-down
          1
          ·
          1 year ago

          Makes sense. I would love to hear how anyone can disagree with this. Just because an AI learned or trained from a book doesn’t automatically mean it violated any copyrights.

          • cerevant@lemmy.world
            link
            fedilink
            English
            arrow-up
            3
            arrow-down
            2
            ·
            edit-2
            1 year ago

            The base assumption of those with that argument is that an AI is incapable of being original, so it is “stealing” anything it is trained on. The problem with that logic is that’s exactly how humans work - everything they say or do is derivative from their experiences. We combine pieces of information from different sources, and connect them in a way that is original - at least from our perspective. And not surprisingly, that’s what we’ve programmed AI to do.

            Yes, AI can produce copyright violations. They should be programmed not to. They should cite their sources when appropriate. AI needs to “learn” the same lessons we learned about not copy-pasting Wikipedia into a term paper.

      • lily33@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        1
        ·
        edit-2
        1 year ago

        It’s specifically distribution of the work or derivatives that copyright prevents.

        So you could make an argument that an LLM that’s memorized the book and can reproduce (parts of) it upon request is infringing. But one that’s merely trained on the book, but hasn’t memorized it, should be fine.

        • scarabic@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          3
          ·
          1 year ago

          But by their very nature the LLM simply redistribute the material they’ve been trained on. They may disguise it assiduously, but there is no person at the center of the thing adding creative stokes. It’s copyrighted material in, copyrighted material out, so the plaintiffs allege.

          • lily33@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            arrow-down
            1
            ·
            1 year ago

            They don’t redistribute. They learn information about the material they’ve been trained on - not there natural itself*, and can use it to generate material they’ve never seen.

            • Bigger models seem to memorize some of the material and can infringe, but that’s not really the goal.
    • volkhavaar@lemmy.world
      link
      fedilink
      English
      arrow-up
      14
      ·
      1 year ago

      This is a little off, when you quote a book you put the name of the book you’re quoting. When you refer to a book, you, um, refer to the book?

      I think the gist of these authors complaints is that a sort of “technology laundered plagiarism” is occurring.

      • cerevant@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        1
        ·
        1 year ago

        Copyright 100% applies to the output of an AI, and it is subject to all the rules of fair use and attribution that entails.

        That is very different than saying that you can’t feed legally acquired content into an AI.

    • Cloudless ☼@feddit.uk
      link
      fedilink
      English
      arrow-up
      16
      arrow-down
      4
      ·
      1 year ago

      I asked Bing Chat for the 10th paragraph of the first Harry Potter book, and it gave me this:

      “He couldn’t know that at this very moment, people meeting in secret all over the country were holding up their glasses and saying in hushed voices: ‘To Harry Potter – the boy who lived!’”

      It looks like technically I might be able to obtain the entire book (eventually) by asking Bing the right questions?

      • cerevant@lemmy.world
        link
        fedilink
        English
        arrow-up
        9
        arrow-down
        7
        ·
        edit-2
        1 year ago

        Then this is a copyright violation - it violates any standard for such, and the AI should be altered to account for that.

        What I’m seeing is people complaining about content being fed into AI, and I can’t see why that should be a problem (assuming it was legally acquired or publicly available). Only the output can be problematic.

        • GentlemanLoser@reddthat.com
          link
          fedilink
          English
          arrow-up
          9
          arrow-down
          4
          ·
          1 year ago

          No, the AI should be shut down and the owner should first be paying the statutory damages for each use of registered works of copyright (assuming all parties in the USA)

          If they have a company left after that, then they can fix the AI.

          • cerevant@lemmy.world
            link
            fedilink
            English
            arrow-up
            6
            arrow-down
            1
            ·
            1 year ago

            Again, my point is that the output is what can violate the law, not the input. And we already have laws that govern fair use, rebroadcast, etc.

        • DandomRude@lemmy.world
          link
          fedilink
          English
          arrow-up
          4
          arrow-down
          1
          ·
          1 year ago

          I think it’s not just the output. I can buy an image on any stock Plattform, print it on a T-Shirt, wear it myself or gift it to somebody. But if I want to sell T-Shirts using that image I need a commercial licence - even if I alter the original image extensivly or combine it with other assets to create something new. It’s not exactly the same thing but openAI and other companies certainly use copyrighted material to create and improve commercial products. So this doesn’t seem the same kind of usage an avarage joe buys a book for.

    • assassin_aragorn@lemmy.world
      link
      fedilink
      English
      arrow-up
      16
      arrow-down
      9
      ·
      1 year ago

      However, if the output of an AI would not be considered infringing for a human, then it isn’t infringement.

      It’s an algorithm that’s been trained on numerous pieces of media by a company looking to make money of it. I see no reason to give them a pass on fairly paying for that media.

      You can see this if you reverse the comparison, and consider what a human would do to accomplish the task in a professional setting. That’s all an algorithm is. An execution of programmed tasks.

      If I gave a worker a pirated link to several books and scientific papers in the field, and asked them to synthesize an overview/summary of what they read and publish it, I’d get my ass sued. I have to buy the books and the scientific papers. STEM companies regularly pay for access to papers and codes and standards. Why shouldn’t an AI have to do the same?

      • bouncing@partizle.com
        link
        fedilink
        English
        arrow-up
        11
        arrow-down
        1
        ·
        1 year ago

        If I gave a worker a pirated link to several books and scientific papers in the field, and asked them to synthesize an overview/summary of what they read and publish it, I’d get my ass sued. I have to buy the books and the scientific papers.

        Well, if OpenAI knowingly used pirated work, that’s one thing. It seems pretty unlikely and certainly hasn’t been proven anywhere.

        Of course, they could have done so unknowingly. For example, if John C Pirate published the transcripts of every movie since 1980 on his website, and OpenAI merely crawled his website (in the same way Google does), it’s hard to make the case that they’re really at fault any more than Google would be.

          • bouncing@partizle.com
            link
            fedilink
            English
            arrow-up
            2
            arrow-down
            1
            ·
            edit-2
            1 year ago

            The published summary is open to fair use by web crawlers. That was settled in Perfect 10 v Amazon.

        • assassin_aragorn@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          1
          ·
          1 year ago

          Haven’t people asked it to reproduce specific chapters or pages of specific books and it’s gotten it right?

          • bouncing@partizle.com
            link
            fedilink
            English
            arrow-up
            1
            arrow-down
            1
            ·
            1 year ago

            I haven’t been able to reproduce that, and at least so far, I haven’t seen any very compelling screenshots of it that actually match. Usually it just generates text, but that text doesn’t actually match.

    • bouncing@partizle.com
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      2
      ·
      1 year ago

      There is already a business model for compensating authors: it is called buying the book. If the AI trainers are pirating books, then yeah - sue them.

      That’s part of the allegation, but it’s unsubstantiated. It isn’t entirely coherent.

      • Flying Squid@lemmy.world
        link
        fedilink
        English
        arrow-up
        5
        arrow-down
        2
        ·
        1 year ago

        It’s not entirely unsubstantiated. Sarah Silverman was able to get ChatGPT to regurgitate passages of her book back to her.

        • AnonStoleMyPants@sopuli.xyz
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          1
          ·
          1 year ago

          I don’t know if this holds water though. You don’t need to trail the AI on the book itself to get that result. Just on discussions about the book which for sure include passages on the book.

        • bouncing@partizle.com
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          1
          ·
          1 year ago

          Her lawsuit doesn’t say that. It says,

          when ChatGPT is prompted, ChatGPT generates summaries of Plaintiffs’ copyrighted works—something only possible if ChatGPT was trained on Plaintiffs’ copyrighted works

          That’s an absurd claim. ChatGPT has surely read hundreds, perhaps thousands of reviews of her book. It can summarize it just like I can summarize Othello, even though I’ve never seen the play.