Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther

Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.

Full article here.

Link to the full leaked list download: Meta leaked list pdf

  • qaz@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    8 months ago

    Does anyone have a link to the .txt file? I can’t grep the PDF.

  • sunbytes@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    8 months ago

    I mean, the API is open.

    I’ve been operating MORE privately on here than I would have on a closed/limited API.

    This data was always going to end up harvested.

  • FlyingCircus@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    8 months ago

    So I’m seeing leftists and nsfw instances being mainly targeted. Are they training AI, or collecting kompromat?

  • absquatulate@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    8 months ago

    Can’t wait for that LLM to become a reddit-hating bloodthirsty linux obsessed furry femboy communist tankie with a weird fondness for beans, star trek and sturgeon

  • HakFoo@lemmy.sdf.org
    link
    fedilink
    arrow-up
    0
    ·
    8 months ago

    Now I want to see a fully Hexbearified LLM.

    Instead of racist conspiracy theories it will divert every topic to beans. And the saucy images will be mostly of cuties from Soviet posters.

    • WalnutLum@lemmy.ml
      link
      fedilink
      arrow-up
      0
      ·
      8 months ago

      That’s good and also somewhat disappointing as they were the first to release the weights and mechanism to run them as open weights.

      A lot of fully open source (and “ethically trained”, depending on your opinion of that entire idea) models still use major portions of the code they open sourced.

      A lot of relatively “good” LLM models run on top of Llama.cpp

      • brucethemoose@lemmy.world
        link
        fedilink
        arrow-up
        0
        ·
        8 months ago

        Meta pays for PyTorch development as well!

        Llama.cpp will be fine of course, it technically has nothing to do with Meta.

        But yeah, it’s mostly disappointing IMO…

        And kinda stupid. These are literally experimental models; they release one experiment with mixed results, and admittedly catastrophically marketing for it, and Zuck pulls the rug?

  • InvalidName2@lemmy.zip
    link
    fedilink
    arrow-up
    0
    ·
    8 months ago

    This is why I go out of my way quite a bit to poison the AI with my pointless boomer anecdotes, largely made up or confiscated. Plus, I rarely proof read my comments anymore, so apologies for the grammatical issues and the hard to believe and rarely either one way or the other but twice the times there’s another type of type that you can also quite not, right?

  • lazynooblet@lazysoci.al
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    8 months ago

    My instance gets pillaged once a day for 20 minutes by what I think is a scraper for an LLM.

    The scraper grabs every post and profile page and the load on the server triggers alerts but the site stays usable.

    I haven’t been able to put a stop to it as the requests come from 1500+ IP addresses, with different user agents.

  • 🌈 vanta rainbow black 🌈@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    0
    ·
    8 months ago

    fedipact has compiled a list of fediverse instances in this leak!!!

    • mastodon.social

    • mastodon.online

    • tech.lgbt

    • hackers.town

    • chaos.social

    mastodon.org.uk

    • mastodont.cat

    mastodon.de

    • mastodon.xyz

    • mastodon.coffee

    • mastodon.cloud

    • mastodon.scot

    mastodonapp.uk

    • mastodon.green

    mastodon.ml

    mastodon.au

    • mastodon.eus

    mastodonczech.cz

    mastodon.sdf.org

    • mstdn.social

    • troet.cafe

    • techhub.social

    tchncs.de

    • kolektiva.social

    mamot.fr

    • defcon.social

    • meow.social

    • social.linux.pizza

    • ioc.exchange

    • eldritch.cafe

    • yiff.life

    • furry.engineer

    • infosec.exchange

    • blahaj.zone

    • woof.group

    • union.place

    • queer.party

    • sakurajima.moe

    • pawb.social

    • digipres.club

    • journa.host

    corteximplant.net

    corteximplant.com

    • octodon.social

    • bitbang.social

    • jorts.horse

    • tenforward.social

    • pnw.zone

    • spore.social

    • hear-me.social

    • neuromatch.social

    • vt.social

    cosocial.ca

    • chitter.xyz

    • tooter.social

    cloudisland.nz

    social.seattle.wa.us

    masto.es

    nobigtech.es

    • mastodon.gal

    • masto.host

    • toot.community

    • pony.social

    • climatejustice.global

    pleroma.envs.net

    • indiepocalypse.social

    • anarchism.space

    disroot.org

    • dragonscave.space

    • toot.bike

    • fuzzies.wtf

    • norden.social

    • beige.party

    • ohai.social

    • freeradical.zone

    • metalhead.club

    • treehouse.systems

    • icosahedron.website

    • sunbeam.city

    • sunny.garden

    zeroes.ca

    • ursal.zone

    chaosfem.tw

    mas.to

    • mathstodon.xyz

    • rubber.social

    todon.nl

    • cupoftea.social

    nerdculture.de

    • toad.social

    from https://cyberpunk.lol/@FediPact/115000125449696514

  • sun@slrpnk.net
    link
    fedilink
    arrow-up
    0
    ·
    8 months ago

    Everything published on the fediverse, everyone can get their hands on it.