Limits to Growth

October 14, 2009

I’ve talked about concerns about resource constraints and population control before on this blog but that’s been in the context of fuel and food resources for everyone on the whole planet. I’d like to talk about limits to growth with regard to fedora.

Today rawhide is showing 15,131 packages. The primary sqlite metadata is 41M uncompressed, 9.7MB compressed.

That’s a pretty serious number on both counts.

Each pkg is roughly 3K in primary, between primary and filelists the number comes out as about 8k per pkg.

Not a towering number but one we have to think about and keep in mind as we grow. Not only that number but also the disk space we use for cvs, lookaside cache, mirrors, koji, the processing time for building all those and the overhead of keeping track and riding herd over all the users/contributors who are maintaining all these pieces.

Now, we can add more infrastructure and have more layers to help deal with the growth but we’re way beyond a human-scale operation anymore. Any one person is not going to be able to keep track of everything going on by themselves. Even with tools the management level is too chaotic to create a structured view of it. To know where everything is.

So the question I have is this: at what point do we think about this and take action? Anyone who says ‘never’ gets a remedial lesson in entropy.

And the follow on question: If it were up to you, how would you do it?


5 Responses to “Limits to Growth”

  1. Mike McGrath Says:

    I guess I’d go back to a core + extras setup. I’d probably move to a core or common repo + a few additional repos. The difficult decision here is what ends up in what repos. Some seem a bit more obvious. A KDE repo, GNOME repo, Games repo perhaps.

    But I’m sure there’s plenty of gnome users who use the occasional kde app and vice versa.

    I think this would require some additional smarts on the yum side as well. Especially if the dep trees don’t line up right, which they almost certainly don’t. Something that seems an obvious game repo package might require something from both the kde repo and the gnome repo to build or install.

    Each of these repos would have their own dedicated team, probably similar to how kubuntu works. At some point in the growth each repo group would almost certainly need some dedicated infrastructure. Our /mnt/koji can barely handle what we have now, we’re sure to out grow it and purchasing larger and faster sans just won’t scale forever.

    But yeah, on the policy I’d say team delegation + responsibility spreading still centrally overseen but mostly independent, similar to how McDonalds are run. On the technical side I’d say we’d need some way to join these different repos together.

  2. Stephen Smoogen Says:

    Well remedial entropy just says that you have a closed system it will eventually reach an equilibrium where no more work is possible. The way that engines and such get around it is by making the ground-state system bigger (eg venting excess heat into the outer system). The question here is how to make the ground-state system bigger (and ensure that it is at a lower state than the engine.)

    The first is to guess at how large the system can grow. Then how to efficiently shunt off ‘waste’. On an engine this is done through feedback systems to keep things from going too hot and thermal systems to ship wasted heat out of the system. Now socialogy/political science is not my training but the analogues would seem to be the following:
    1) Social bureaucracy. When done right it is a feedback loop allowing for things to slow down so problems can be found and resolved before too late. When done wrong it looks like a modern car engine with sensors ontop of sensors to detect when things have fallen out of some norm.
    2) Social shifts. The good old “Go West Young Man” ideal where people who could not conform (eg high social entropy) could go out to places unknown thus ‘cooling’ the more ‘civilized’ country behind. [The analogue of the radiator comes to mind]. Again in small quantities it might be good but as you get bigger it becomes caste systems and such.
    3) ???

    The questions are what methods work and where do they fall apart.

  3. James Says:

    > So the question I have is this: at what point do we think about this and take action?

    Never … well, ok, plan for never but then do something when it hurts really bad :).

    I don’t see the ~650k of compressed data per. pkg to be that big of a problem, although I’m sure it will hurt eventually.
    But splitting the repos. will hurt a lot, although maybe less so if done well (maybe a “leaves packages” repo.).

    On the fact that “the whole can’t be seen by one human anymore”, I’m less sure that’s a problem. If atop/htop/whatever is broken, I just don’t care unless I use those apps.

    I think over time we’ll probably have more stuff like your “protected packages” stuff, where there are multiple classes of packages … but I think that’ll just be processes reflecting current/past reality (glibc is more important than atop), and not separate repos.

  4. Aurélien Says:

    That’s a problem that all distributions will have to face one day, as the amount of Free Software increases.

    Is there a way to go from our centralized (but well controlled) distribution setup to something more decentralized ? Maybe let the free software projects host the packages, and have the downloader follow a link to the rpm ? Since rpms can be individually signed, we wouldn’t loose security.

    That would not solve the metadata size problem however. Should we provide some kind of webservice for yum to query ? Maybe not, that would lower the bandwidth but go in the “centralized” direction.

    Can Yum download incremental metadata updates, as it can do for rpms with presto ? I remember seing apt do something like that.

    Anyway, that’s a good question every distro will have to face soon. It would be nice to see Fedora take the lead again 🙂

  5. Stephen Smoogen Says:

    @Aurélien I think some decentralization would be needed but the problems I see with the first proposal is it increases other entropy that distributions remove.

    The issue is that many upstreams have their own ideas of where things are going to go if they are doing the packaging. You end up with some putting everything in /opt or /usr/local. Or do we use a BSD layout or a SYS-V layout. And then you have the upstream who is in an argument with another developer so he puts decides to obsolete that persons package.

    And having someone put in

    EPOCH: MaxInt
    Provides: kernel,glibc

    %(post): /sbin/reboot
    in something would really make for a bad day.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: