• ℍ𝕂-𝟞𝟝@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    15
    ·
    1 day ago

    AI does not triple traffic. It’s a completely irrational statement to make.

    Multiple testimonials from people who host sites say they do. Multiple Lemmy instances also supported this claim.

    I would bet that the number of requests per year of s resource by an AI scrapper is on the dozens at most.

    You obviously don’t know much about hosting a public server. Try dozens per second.

    There is a booming startup industry all over the world training AI, and scraping data to sell to companies training AI. It’s not just Microsoft, Facebook and Twitter doing it, but also Chinese companies trying to compete. Also companies not developing public models, but models for internal use. They all use public cloud IPs, so the traffic is coming from all over incessantly.

    Using as much energy as a available per scrapping doesn’t even make physical sense. What does that sentence even mean?

    It means that Microsoft buys a server for scraping, they are going to be running it 24/7, with the CPU/network maxed out, maximum power use, to get as much data as they can. If the server can scrape 100 sites per minute, it will scrape 100 sites. If it can scrape 1000, it will scrape 1000, and if it can do 10, it will do 10.

    It will not stop scraping ever, as it is the equivalent of shutting down a production line. Everyone always uses their scrapers as much as they can. Ironically, increasing the cost of scraping would result in less energy consumed in total, since it would force companies to work more “smart” and less “hard” at scraping and training AI.

    Oh, and it’s S-C-R-A-P-I-N-G, not scrapping. It comes from the word “scrape”, meaning to remove the surface from an object using a sharp instrument, not “scrap”, which means to take something apart for its components.

    • daniskarma@lemmy.dbzer0.com
      link
      fedilink
      arrow-up
      1
      arrow-down
      4
      ·
      1 day ago

      I’m not native English speaker. So I would apologize if there’s bad English in my response. And would thank any corrections.

      That being said I do host public services, before and after AI was a thing. And I have asked many of these people who claim “we are under AI bot attacks” how are they able to differentiate when a request is from a AI scrapper or just any other scrapper and there was no satisfying answer.

      • ℍ𝕂-𝟞𝟝@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        11
        ·
        1 day ago

        Yeah but it doesn’t matter what the objective of the scraper is, the only thing that matters is that it’s an automated client that is going to send mass requests to you. If it wasn’t, Anubis would not be a problem for it.

        The effect is the same, increased hosting costs and less access for legitimate clients. And sites want to defend against it.

        That said, it is not mandatory, you can avoid using Anubis as a host. Nobody is forcing you to use it. And as someone who regularly gets locked out of services because I use a VPN, Anubis is one of the least intrusive protection methods out there.

        • daniskarma@lemmy.dbzer0.com
          link
          fedilink
          arrow-up
          1
          arrow-down
          5
          ·
          edit-2
          1 day ago

          It’s very intrusive in the sense that it runs a PoW challenge, unsolicited on the client. That’s literally like having a cryptominer running on your computer for each challenge.

          Each one would do what they want with their server, of course. But for instance I’m very fond of scraping. For instance I have FreshRSS running ok my server, and the way it works is that when the target website doesn’t provide a RSS feed ot scrapes it to get the articles. I also have other service that scrapes to get pages changes.

          I think part of the beauty of internet is being able to automate processes, software lile Anubis puts a globally significant energy tax on theses automations.

          Once again, each one it’s able to do with their server whatever they want. But the think I like the least is that they are targeting with some great PR their software as part of some great anti-AI crusade, I don’t know if the devs itself or any other party. And I don’t like this mostly because I think is disinformation and just manipulative towards people who is maybe easy to manipulate if you say the right words. I also think that it’s a discourse that pushes into radicalization from certain topic, and I’m a firm believer that right now we need to overall reduce radicalization, not increase it.

          • xthexder@l.sw0.com
            link
            fedilink
            arrow-up
            2
            ·
            edit-2
            21 hours ago

            A proof of work challenge is infinitely better than the alternative of “fuck you, you’re accessing this through a VPN and the IP is banned for being owned by Amazon (or literally any data center)”