@weirdwriter @indirectferret @hyenagirl64 The filter was on the *content* of the post - not on the instance, or the user (though there was, I guess, a score given to users in the back-end).
Here's an example:
"Hey man, I'm not sure you're thinking of this correctly" would be fine and get through.
"Fuck off, you shitty little twit, eat rocks" probably wouldn't.
The second post would get posted as far as the person who wrote it knew, but I'd never see it and it would get no engagement.
@TechConnectify @weirdwriter @indirectferret @hyenagirl64
I know we're probably a long way from any kind of algorithmic filter like that on here, but have you experimented with the existing filter system?
You can block specific phrases, not just words, so you can hit up the things causing you the most psychic damage and literally make them disappear.
@lockelyfox That doesn't really seem like a solution. Even blocking the phrase "fuck off" probably hides a lot more posts that I'd want to see than posts I'd rather not see. A more effective tool would let you filter based on tone or semantic content (i.e. it would sense when vitriol is directed at the recipient rather than at, say, poorly designed brake lights). It can absolutely be done but the challenge may be how to implement controls at the user or instance level.
@zxo
I'm not sure why you'd want to see posts telling you to fuck off, but neither you nor I am at a level that Alex is in terms of pure volume of interaction and engagement.
We're pretty far off from an algorithmic solution and determining whether the cycles spent for this algorithm need to be passed to the instance owner or the user is something we haven't worked out either.
It's easy to say "just build a tool that detects tone" but then we're looking at LLM-type models and that gets *incredibly expensive incredibly fast.* Just look at how much CoPilot is costing MS.
@lockelyfox I *don't* want to see posts telling ME to fuck off, but I may want to see other posts that contain the text "fuck off". Keyword filters just aren't going to cut it for what Alex seems to be asking for, without causing massive numbers of false positives.
Clearly Twitter had an algorithm that did this, at least for popular accounts. It needn't be a full-blown LLM, just a narrow one trained to identify if a post is a personal attack or not. Would that be more doable?