if your fedi stats bot is being thrown off by instances providing inaccurate data, that's a you issue, and it's happening because you don't know what data is and how to collect it, and much more importantly, you're noticing the exaggerated numbers from GtS instances because they are, quite conveniently, illogical numbers
so while you're being thrown off by "baffled" stats, don't you think, you're excessively vulnerable to maliciously tweaked stats that aren't obviously inaccurate?
perhaps you shouldn't collect "data" by merely hitting some endpoint in some web apps, and, if you really care about accuracy, bother for a wee moment to build in some basic statistical checks into your software that can easily spot exaggerated and otherwise implausible reports from instance APIs?
you can for example make a basic guess that an instance wouldn't grow >x% per day and code a case in your app so that any growth that exceeds that is flagged for manual review?
stop and think about this for a moment. you wrote a thing that asks websites how popular they are, and takes their answer for Granted, without interrogation...
all this talk about robots.txt and ideas about consent is secondary. your software is Vulnerable
if someone snuck in a malicious patch into the mastodon docker image that reports all instances as 25% smaller, if they were smart about it, your code would report fedi as maybe 20% smaller, and nobody would notice without fucking Forensics…
to be clear my position is, 100%, robots.txt is sacrosant, and i think all this bean counting is bullshit. it's uninteresting and boring to me
but i am posting this thread based on my familiarity with quantitative social science as an MA in linguistics, and from that PoV all i can say is, you don't get to complain about your participants. your data is your responsibility. people you observe don't live to be observed or to comply. resilience of your measurement tools is your responsibility
question: those of you who went abroad on erasmus during their doctoral studies, what was the experience like? would you recommend doing it? was it worth it?
it seems like part of my doctorate will benefit from archives and mayhaps oral history interviews in northeastern mediterranean and methinks it would make sense to just spend 6-12mo at someplace nearby, instead of many visas and travel back and forth home