Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther

Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.

Full article here.

Link to the full leaked list download: Meta leaked list pdf

  • lazynooblet@lazysoci.al
    link
    fedilink
    English
    arrow-up
    47
    ·
    edit-2
    7 days ago

    My instance gets pillaged once a day for 20 minutes by what I think is a scraper for an LLM.

    The scraper grabs every post and profile page and the load on the server triggers alerts but the site stays usable.

    I haven’t been able to put a stop to it as the requests come from 1500+ IP addresses, with different user agents.