Yum optimizations

December 21, 2006

I got play with a shell on an OLPC box last week thanks to Chris Blizzard and it had the advantage of being so slow and low on memory that it really shows off where yum performs for crap. I’ve been spoiled by fast hardware b/c of my job so I’ve not seen some of these issues at all. However, I’ve seen them now and we’re looking at some creative solutions.

The first one and somewhat controversial is possibly doing away with the xml->sqlite conversion on the client side. Essentially, on each run of yum it checks for new metadata, downloads the xml files, and imports them. Now, in yum 3.0 Tambet Ingo introduced a c-based python module that would do the xml->sqlite conversion very quick. However, even with the module the OLPC box took 4-6 minutes to import the metadata of fedora core and extras. That’s a world of suck. So, I started to compare sizes of the compressed sqlite files and the compressed xml files, and checking to make sure that sqlite would play nice if moved from various architectures to others. All those things seem to be correct. So, what we’re thinking of doing is dropping a premade sqlite db into each repository as another optional metadata piece. Then yum clients which understood to look for them could download them and be able to skip the metadata conversion step entirely. On fast systems that’s really only saving 7-15 seconds. On slow systems it saves many minutes.

The next thing we’ll be doing is going through the db format on disk and seeing if we missed any format benefits that could make searching faster. (which makes depsolving faster). Additionally, Panu pointed out the advantage of certain indexes on performance.

Finally, I think we’ll be rifling through the depsolving routines to look for obvious caching opportunities in the routines.

I’d like to ask folks who are interested in profiling or optimizing yum w/o fundamentally altering its behavior to take a look through this thread and if you have something to add, please do so.

4 Responses to “Yum optimizations”

  1. gajownik Says:

    Have you considered using 7zip instead of bzip2? It has better compression ratio and decompresses files faster:

    [gajownik@rumcajs test]$ ll
    razem 15844
    -rw-r–r– 1 gajownik gajownik 1671332 gru 21 10:05 primary.xml.gz
    -rw-r–r– 1 gajownik gajownik 10086400 gru 21 10:06 primary.xml.gz.sqlite
    -rw-r–r– 1 gajownik gajownik 2047385 gru 21 10:11 primary.xml.gz.sqlite.7z
    -rw-r–r– 1 gajownik gajownik 2365303 gru 21 10:10 primary.xml.gz.sqlite.bz2
    [gajownik@rumcajs test]$ rm primary.xml.gz.sqlite
    [gajownik@rumcajs test]$ time 7za x primary.xml.gz.sqlite.7z

    7-Zip (A) 4.42 Copyright (c) 1999-2006 Igor Pavlov 2006-05-14
    p7zip Version 4.42 (locale=pl_PL.UTF-8,Utf16=on,HugeFiles=on,1 CPU)

    Processing archive: primary.xml.gz.sqlite.7z

    Extracting primary.xml.gz.sqlite

    Everything is Ok

    real 0m1.607s
    user 0m1.417s
    sys 0m0.187s
    [gajownik@rumcajs test]$ rm primary.xml.gz.sqlite
    [gajownik@rumcajs test]$ time bunzip2 primary.xml.gz.sqlite.bz2

    real 0m3.155s
    user 0m2.958s
    sys 0m0.189s
    [gajownik@rumcajs test]$

    (tested on 650Mzh Pentium III)

  2. Frank Says:

    1. For those system with large memory, those xml files can be compressed very much smaller using http://xml-wrt.sourceforge.net/ and save the time taken to download.

    2. For some packages, when you try to upgrade, the RPMs dependencies header is as large as the packages RPM itself. Why need a header as large as 2 to 3Mbytes? XML files should consist the packages dependencies information, so header is not required during upgrade.

    3. Using Fedora 6 with yum 3.0.1, even with “yum -C check-update”, a network connection is still required. My network required proxy authenication, when I run “yum -C check-update” with disabled “proxy_password” from yum.conf, yum 3.0.1 failed but works fine for yum 2.6. Those xml files are downloaded by wget and move to yum cache so it should not failed with “yum -C”. In fact I uses wget with bash script to download those RPMs and xml files. yum is only use for “yum -C check-update”

  3. Frank Says:

    Another point to add.
    4. Is it possible to change the display format?
    Change from 3 columns display to 1 column.
    3 columns:
    “openoffice.org-base.i386 1:2.0.2-5.20.2 updates.”
    1 column:
    “openoffice.org-base-2.0.2-5.20.2.i386.rpm”


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: