#lemmy/#kbin has a problem that #mastodon hasn't even attempted to solve; groups and what happens when they get popular.
#Communities, #groups, #magazines, whatever they are called are implemented as #Actors in #ActivityPub. They are basically just *very* popular users who boost a *lot*.
You can't just distribute them across instances the way normal actors do. Whichever server hosts @technology@lemmy.ml or @technology@beehaw.org is going to get HOSED on the regular.
I think we will need to use communities that are hosted on smaller instances. Beehaw.org and Lemmy.ml are bottlenecks. We need an instance per-subreddit.
@schizanon either that or find ways to distribute the community across many instances. E.g. I want to post to the technology sub on Lemmy.ca and see the technology posts from lemmy.ca, lemmy.ml, beehaw.org and any other instance that has technology posts.
Some UX and architecture required to understand the best approach to federate a single community, but it seems like a natural Fediverse way of doing it. If lemmy.ml is down, then the community is resilient and doesn't rely on that one server.
@mkhoury @schizanon I think the most complicated aspect of this (as usual) is the question of moderation. Who moderates a community across multiple instances and how? What is the balance of power between instances and communities?
I'm not saying it can't be done, but I feel like it would be a tough one to do well.
@codesmith @schizanon it doesn't seem very hard to me. Every instance moderates the posts in their own instance. Every instance can federate their community with other instances who have the same community. As a user, I can choose to block/hide users and instances from my view of the community.
If I don't like the posts from lemmy.ml in technology, I can just hide them.
@mkhoury @codesmith so if anyone wants to *really* follow (at)technology they've got to follow it multiple times and then keep track of which instance abusive content is coming from the most? That's a lot more complicated than it was on Reddit.
That doesn't really prevent centralization either, in fact it encourages it; bigger instances are less likely to be defederated so users will consolidate on "too big to defederate" instances like Lemmy.ml.
What does hosed mean, technically?
I think they meant like "overloaded", like a hose spraying water, but the water being users from all around the fediverse
Lots of traffic, lots of posts, lots of comments, ... That's going to need more storage, more bandwidth, more CPU power, higher running costs.
Ideally, there would be a way to distribute this load across instances according to their resources, but from my (currently limited) knowledge, I don't think Lemmy/ActivityPub is really geared for that kind of distributed computing, and currently I don't believe that there's a way to move subs between instances to offload them (although I believe some people may be working on that).
Perhaps the Lemmy back-end could use a distributed architecture for serving requests and storage, such that anyone could run a backend server to donate resources.
For example, I currently have access to a fairly powerful spare server. I'm reluctant to host a Lemmy instance on it as I can't guarantee its availability in the long term (so any communities/user accounts would be lost when it goes down), but while it's available I'd happily donate CPU/storage/bandwidth to a Lemmy cloud, if such a thing existed.
There are pros and cons to this approach, but it might be worth considering as Lemmy grows considerably.
I don’t think it’s a problem. If you weren’t using activity pub and just something like reddit then if you were reddit (the sysadmin) you’d also deal with having to scale if your community gets really popular
Stuff that gets linked to also has the same problem
https://www.jwz.org/blog/2022/11/mastodon-stampede/
(Btw I don’t like jwz but he mentions it here)
Funny how you say it's not a problem, then go on to describe the problem that needs to be dealt with. Dealing with scaling is a problem, and it's a problem that costs money.
Posts like this: https://lemm.ee/post/58472 suggest it is a problem. The rise in traffic seen by Lemmy in the last few days is absolutely tiny compared to a site like reddit, and already instances are struggling to cope. The recent growth in user registrations represents only about 0.007% of reddit's active user base. (~60K new Lemmy users vs 861,000,000 active monthly reddit users)
There are 190+ Lemmy instances last time I checked, yet almost all the brunt of this load has been borne by a handful of servers, which see an inordinate amount of traffic while 100+ other servers sit almost idle. Why should a handful of "lucky" servers have to pay all the hosting costs? What if a volunteer-run instance explodes in popularity? It will simply fold, unless the volunteer has money to throw at the problem. A site like reddit costs millions to run.
Lemmy in the last few days is absolutely tiny compared to a site like reddit, and already instances are struggling to cope.
While this is true, 5 days ago lemmy.ml, the biggest instance, was on a 67 EUR server which is very small. https://news.ycombinator.com/item?id=36270094
Posts like this: https://lemm.ee/post/58472 suggest it is a problem
This is a scaling problem (having more users means you need more mods) but I disagree with how they handled it and it isn't a money related thing. My thoughts on this are in an older post when this was first announced https://partizle.com/comment/64178
Why should a handful of “lucky” servers have to pay all the hosting costs?
My initial idea is to use the something awful model of paying a one time fee to register an acount. The problem is that people would just sign up on another instance that doesn't charge a fee but still add load to the lucky instance. Another approach could be to participate in communities on one of those lucky servers then you need to pay a one time fee to that server (comments would need to be removed by a bot if they're not made by an approved user). I'm not saying that's perfect, but it's an idea. Adsense is another idea.
Again, you say it's not a money problem.... then go on to describe a money problem!
Also, did you read the link included in the post I linked to? ( https://beehaw.org/post/520044?scrollToComments=true )
That's a money problem and a time problem. (And time problems are money problems.)
High traffic sites need lots of money and resources to run. That's just a fact.
We can solve this in many ways as Lemmy grows (and I think we will), but to just pretend there aren't any problems to be solved is naive, IMO.
If Lemmy grows to any significant percentage of reddit traffic, the Lemmy of tomorrow will (necessarily) look quite different to the Lemmy of today.
@beejjorgensen depends on the resources of the instance; most likely having the network link saturated, but possibly the memory or CPU too. I don't run an instance so I don't know the profile. DDoSed basically.
it may be worth putting a bug request on the activitypub github, because I agree that could become a huge problem, but its also alot of work to implement because most instances will need to update to the newest activitypub standard once they approve a new version of the standard.
Did you post this from Mastodon? I wish I could tell where this came from.
Basically if I understand this right, if you have an instance with a very popular community on it. It is likely that it will need some massive infrastructure scaling if it wants to handle the enormous amount of world wide traffic?
Yes. If you run the server, then you are the source of truth of that community. All other servers that federate your community query your server to access the community and show it to their users.
So if you run a server and a community explodes there, you might only have 500 users on your instance, but you might have 50k users reading that community and interacting with it from other Lemmy instances, thus your server needs to scale to 50k users worth.
And ever more essential, your server is the source of truth of that community. So if your server is hacked or corrupted or deleted, that community is gone. Other instances don't mirror it (except for temporary caching), so the Lemmy network essentially is a trust network of other people maintaining servers long term (and each inventing a monetary system to pay for it). I still think the network might be better than a centralized system like reddit, but it definitely has a lot of growing and policies that need to be sorted out very soon
So are these other servers just routing requests from their users to your server's community? Or are they actually copying everything over every so often (caching) and serving up the requests themselves? How real time is it, I guess is what I'm asking?
@Nymphioxetine posted from mastodon (because Lemmy is slow right now)
I wish I could tell where this came from.
Isn't that what this colourful icon in Lemmy is for? It appears to link to the original source of the post or comment:
While it's true that the hosts of popular communities will get more traffic, it's actually not as bad as it first seems.
Every Lemmy instance with at least one subscriber in that popular community will act as a mirror. That means that users who are just reading posts and comments will not cause any additional load on the home-instance of the popular community, because they are consuming local copies of the posts and comments.
This will actually help scaling a lot, and is in fact exactly how many centralized platforms scale (by creating a bunch of read-only copies of content).
As long as we can distribute the Lemmy userbase between different instances (and avoid creating one or two centralized super-instances), we can take a lot of advantage of this mirroring and the scaling will be quite good!
@sunaurus what about when those users like/boost/reply?
In those cases, the action will need to propagate back to the home server (that's where the "hosts of popular communities will get more traffic" comes from), but keep in mind - people usually read at least one or two orders of magnitude more than they write.
@sunaurus there's a lot of upvoting happening on popular subreddits
Absolutely, but a user will only upvote a post once, while they will read it on every reload of their page. (By "read" I mean "fetch it from their local mirror")
@sunaurus how often do retallies of vote counts get propagated?
Hmm, you could probably extend the protocol to do eventual consistency across instances if that ever becomes a problem, remote instances could keep their own counts and only send aggregated updates.
@schizanon @technology@lemmy.ml @technology@beehaw.org Yep. I expect that long-term a lot of instances will limit the ability to create groups (as Beehaw does) or place restrictions on the size a group is allowed to grow before asking that group to move to another/their own instance or ask for finances to help with the costs.