Lewis' Blog Tales from the trenches of information technology


A sincere apology to users of my YUM repo mirror

Download PDF

You see, it all started in January, when I decided to do a good deed (I should have known better). I set up a YUM repository mirror for Netlabs.org, as OS/2 now uses RPM and YUM for (some) package management (I have a post in draft on that whole issue). It turns out, however, that for reasons which I shall explain, the xml (and related) files in my mirror weren't getting updated, so while I was doing a great job of adding content, I wasn't updating the repo information, so subscribers had no idea.

So, for all of you out there who have come to rely over the past couple of months on my US mirror of the Netlabs YUM repository, I most humbly apologize for the inconvenience. It's fixed now, though.

The master repo does not run rsync, which severely limits my options for mirroring the contents. I've been using wget, but the problem with wget is that if the server does not issue timestamps for files it serves, the -N option to compare against the local file's timestamp is of no use.

Not wanting to download every file in the repository, I initially thought it prudent to use the -nc option (no-clobber) so that I would only get new files. Alas, the problem here is that I really do need to overwrite the contents of the /repodata directories for each branch of the repository.

So, my initial crontab entry:

15 * * * * wget.exe -c -nc -nv -r -R index.html -a c:/var/log/wget_rpm_netlabs_org_mirror.log http://rpm.netlabs.org/ >nul 2>&1

which translates to:

every 15 minutes, run wget, continue if errors, do not overwrite existing files, be less verbose, recurse directories, do not download index.html, log to the specified file, and download from the given url as the root

doesn't quite get the job done (the crontab entry above is incomplete, because it actually starts with a change to the proper drive and directory for the web space).

What I ended up doing (until I switch over to curlmirror.pl - or something better) was to run a second job from a script, which actually touched on each /repodata directory to pull and overwrite the local content. I now run that five minutes after the initial "non-clobbering" pass, but it's not a panacea (if any repo content is moved on - or removed from - the original, for example, I have no mechanism for removing the file(s) from mine).

Ideally, I could do this with rsync quite efficiently. My goal, however, is to cause the master maintainer the least amount of effort on his part, which includes not asking him to set up rsync just for me.

If someone has a better idea which doesn't involve running some convoluted Java app (yep, been there; done that) and can be scripted to be triggered from cron, I'm all ears.

Comments (0) Trackbacks (0)

No comments yet.

Leave a comment

No trackbacks yet.