December 21, 2006
I got play with a shell on an OLPC box last week thanks to Chris Blizzard and it had the advantage of being so slow and low on memory that it really shows off where yum performs for crap. I’ve been spoiled by fast hardware b/c of my job so I’ve not seen some of these issues at all. However, I’ve seen them now and we’re looking at some creative solutions.
The first one and somewhat controversial is possibly doing away with the xml->sqlite conversion on the client side. Essentially, on each run of yum it checks for new metadata, downloads the xml files, and imports them. Now, in yum 3.0 Tambet Ingo introduced a c-based python module that would do the xml->sqlite conversion very quick. However, even with the module the OLPC box took 4-6 minutes to import the metadata of fedora core and extras. That’s a world of suck. So, I started to compare sizes of the compressed sqlite files and the compressed xml files, and checking to make sure that sqlite would play nice if moved from various architectures to others. All those things seem to be correct. So, what we’re thinking of doing is dropping a premade sqlite db into each repository as another optional metadata piece. Then yum clients which understood to look for them could download them and be able to skip the metadata conversion step entirely. On fast systems that’s really only saving 7-15 seconds. On slow systems it saves many minutes.
The next thing we’ll be doing is going through the db format on disk and seeing if we missed any format benefits that could make searching faster. (which makes depsolving faster). Additionally, Panu pointed out the advantage of certain indexes on performance.
Finally, I think we’ll be rifling through the depsolving routines to look for obvious caching opportunities in the routines.
I’d like to ask folks who are interested in profiling or optimizing yum w/o fundamentally altering its behavior to take a look through this thread and if you have something to add, please do so.