@marginalia@mastodon.social cover
@marginalia@mastodon.social avatar

marginalia

@[email protected]

I've built the indie internet #searchengine #marginaliasearch. Working full time on this project.

This profile is from a federated server and may be incomplete. View on remote instance

@tante@tldr.nettime.org avatar tante , to random

Cool project: "Nepenthes" is a tarpit to catch (AI) web crawlers.

"It works by generating an endless sequences of pages, each of which with dozens of links, that simply go back into a the tarpit. Pages are randomly generated, but in a deterministic way, causing them to appear to be flat files that never change. Intentional delay is added to prevent crawlers from bogging down your server, in addition to wasting their time. Lastly, optional Markov-babble can be added to the pages, to give the crawlers something to scrape up and train their LLMs on, hopefully accelerating model collapse."

https://zadzmo.org/code/nepenthes/

marginalia ,
@marginalia@mastodon.social avatar

@tante While I get the urge to get back at misbehaving AI crawlers, these types of crawler traps have been around for decades, and are pretty trivial to detect and avoid.

Detecting word salad is also relatively easy, since large language models are literally statistical models of language, identifying aberrant linguistic patterns is not a hard problem.

Realistically, the flood of AI-authored spam blogs are doing a far better job poisoning training data than these stunts ever could.