Don’t delay upgrades

I usually try and upgrade my computers fairly quickly once a new operating system or a new version of an application comes out, even if there aren’t particularly new features that I’m looking for. However sometimes I’ve been a little slow to upgrade some servers because I’m too busy with other stuff or it’s too hard to schedule downtime. I was reminded this week why it’s a good idea to do upgrades sooner rather than later.

I had some computers that I had gotten behind on upgrading the operating system by about 1 year. The security patches were applied, but there was a newer version of the OS and I just didn’t have the time to take care of it. Well, since then I moved out of that job, but still depended on the server. Since then it’s up to someone else to upgrade this system and they’re much like me, very busy with other things. So time goes on and now this server is 3 years behind on the OS upgrade and there are some major changes in the OS.

Now it’s time to replace the hardware of the machine. Since it’s a Linux machine the standard answer is just move the drives and keep going. I suspected there might be problems, so I left my number with my replacement. He started the replacement and well, it didn’t go smoothly. As it turns out the new hardware wasn’t quite supported by the older OS, such that the system would partially boot, but not completely. So we ended up doing a full upgrade across 3 minor and 1 major versions of the OS and then fixing up all of the little things that broke along the way.

In the end this probably took longer than it would have to do along the way because the system configuration would have been fresher in our minds. Plus the changes wouldn’t have been so drastic and things would likely have migrated much easier.

So remember to make time to upgrade your systems right away.

Migrating bacula from MySQL to PostgreSQL

So I’ve been looking to migrate my bacula installation from MySQL to PostgreSQL. Personally I like PostgreSQL better and the claims on the bacula-users list were that it’s faster. So I did a bunch of reading, and then testing the database conversion and finally have made it through the process. Here is the results of how to do it. The system that I did the migration on is an OpenSUSE 11.2 system.

I first upgraded my install from bacula 3.0.3 to 5.0.0, still using MySQL. This was a pretty straightforward process. A little different because the packing of the RPMS changed between 3.0.3 and 5.x.

Backup my existing config files and the database.

tar -czf /root/bacula-backup.tar.gz /etc/bacula
mysqldump -u bacula -ppassword bacula > bacula-3.0-mysql.sql

Remove the old RPMs

zypper remove bacula bacula-bat bacula-updatedb bacula-server

Install the new RPMs

zypper install bacula-console bacula-console-bat bacula-director-mysql bacula-storage-mysql bacula-client bacula-director-mysql

Update the database

/usr/lib/bacula/update_mysql_tables -u bacula -ppassword bacula

The next step was the hard one, converting the database. I used a post from the bacula-users list to come up with the appropriate mysqldump line. I then repeatedly created the postgreSQL database and tried to import the dump until it imported without serious errors. In the end this is the pipeline that created a good dump:

mysqldump -t -n -c --compatible=postgresql --skip-quote-names --skip-opt --disable-keys --lock-tables -u bacula -ppassword bacula \
  | grep -v "INSERT INTO Status" \
  | sed -e 's/0000-00-00 00:00:00/1970-01-01 00:00:00/g' \
  | sed -e 's/\\0//' > fixed-bacula-backup.sql

The mysqldump line is pretty much what was in the mailing list post, except that I did all of the tables at once. The grep is to get rid of inserts into the status table. I was having issues with duplicate keys and such and the bacula-users list assured me that this table is created by the make_postgresql_tables script. The first sed line is to fix some bad dates. MySQL allows a date of all zeros, PostgreSQL doesn’t, so I just bumped the 0 dates to the beginning of the unix epoch. The second sed line removes the extra null characters that showed up on all of the inserts into the log table. I’m not sure what caused these, but PostgreSQL doesn’t like to import them and this made it much happier.

I then setup a .pgpass file in root’s home directory so that I could secure the postgreSQL database with a password and not put it in my bacula config files. You can learn about the pgpass file in the postgreSQL documentation.

Next it’s just a matter of creating the PostgreSQL tables as the postgres user (or some other user with postgresql superuser privileges)

./create_postgresql_database
./make_postgresql_tables
./grant_postgresql_privileges

And then loading in the data. This load took a little over an hour and a half on my system, so be prepared to wait a bit.

psql -Ubacula bacula < fixed-bacula-backup.sql

Now one needs to reset the sequences that postgreSQL uses to autocreate ids. I started with the instructions in the bacula manual, but needed to add a couple of missing sequences.

SELECT SETVAL('basefiles_baseid_seq', (SELECT MAX(baseid) FROM basefiles));
SELECT SETVAL('client_clientid_seq', (SELECT MAX(clientid) FROM client));
SELECT SETVAL('file_fileid_seq', (SELECT MAX(fileid) FROM file));
SELECT SETVAL('filename_filenameid_seq', (SELECT MAX(filenameid) FROM filename));
SELECT SETVAL('fileset_filesetid_seq', (SELECT MAX(filesetid) FROM fileset));
SELECT SETVAL('job_jobid_seq', (SELECT MAX(jobid) FROM job));
SELECT SETVAL('jobmedia_jobmediaid_seq', (SELECT MAX(jobmediaid) FROM jobmedia));
SELECT SETVAL('media_mediaid_seq', (SELECT MAX(mediaid) FROM media));
SELECT SETVAL('path_pathid_seq', (SELECT MAX(pathid) FROM path));
SELECT SETVAL('basefiles_baseid_seq', (SELECT MAX(baseid) FROM basefiles));
SELECT SETVAL('client_clientid_seq', (SELECT MAX(clientid) FROM client));
SELECT SETVAL('file_fileid_seq', (SELECT MAX(fileid) FROM file));
SELECT SETVAL('filename_filenameid_seq', (SELECT MAX(filenameid) FROM filename));
SELECT SETVAL('fileset_filesetid_seq', (SELECT MAX(filesetid) FROM fileset));
SELECT SETVAL('job_jobid_seq', (SELECT MAX(jobid) FROM job));
SELECT SETVAL('jobmedia_jobmediaid_seq', (SELECT MAX(jobmediaid) FROM jobmedia));
SELECT SETVAL('media_mediaid_seq', (SELECT MAX(mediaid) FROM media));
SELECT SETVAL('path_pathid_seq', (SELECT MAX(pathid) FROM path));
SELECT SETVAL('pool_poolid_seq', (SELECT MAX(poolid) FROM pool));

Updates I needed to add:

SELECT SETVAL('device_deviceid_seq', (SELECT MAX(deviceid) FROM device));
SELECT SETVAL('location_locationid_seq', (SELECT MAX(locationid) FROM location));
SELECT SETVAL('locationlog_loclogid_seq', (SELECT MAX(loclogid) FROM locationlog));
SELECT SETVAL('log_logid_seq', (SELECT MAX(logid) FROM log));
SELECT SETVAL('mediatype_mediatypeid_seq', (SELECT MAX(mediatypeid) FROM mediatype));
SELECT SETVAL('storage_storageid_seq', (SELECT MAX(storageid) FROM storage));

After that I needed to modify the Catalog section in my bacula-dir.conf file to use localhost for the “DB Address”, remove the mysql socket reference and remove the password reference.

I also needed to modify the backup catalog command to be this (all on one line):

RunBeforeJob = "/usr/lib/bacula/make_catalog_backup bacula bacula \"\" localhost"

Computer companies being cheap can be annoying

I have a 15″ Macbook Pro unibody for work. Recently I was looking at the hard drive specs as I needed to upgrade another users laptop. Turns out that the SATA controller is capable of 3Gbps, however the drive from Apple is only capable of 1.5Gbps. What’s really annoying about this is that I specifically requested a 7200RPM drive from Apple to get some extra performance. Granted that the interface usually isn’t the bottleneck here, but still it’s annoying.

Don’t change 2 things at once

So I’m working on a project that has a number of components to it. In particular the data is all stored in a MySQL database. For various reasons we wanted to convert this to a PostgreSQL database. So off I went working on a branch to make the changes and test the system. Meanwhile other parts of the system are changing as well, in particular the size of the input data. When it comes time to merge I get everything setup and then merge the changes in and all the tests pass, so I commit.

Then we notice that the nightly performance run on the continuous integration server is really slow, taking 2 hours instead of 15 minutes. We had noticed there being some slowness on loginsĀ  before, but now the logins were slow and the software being tested was really slow. So we go about testing the CI server and find that openSUSE has been kind enough to keep track of the MAC address from the original system (we had installed on a different drive and chassis and then moved to this identical chassis). This caused the IPv6 link local address of this machine to match that of the previous system, which happened to be on the same network. This is bad, so we changed the settings back to the right MAC address and things were better, but still slow. So we decided to reinstall the OS, there are only a couple of directories of data to save, so no big deal.

After the reinstall, logins are faster, but the performance test is still slow. So we blame this on the changes to the input data. So over the next month or so we optimize the handling of the input data and find ways to reduce the input data some, but not back to the original size. The performance test is better, but still not back where it should be, we’re down to about 1.5 hours now.

Other tasks kept me busy, so I didn’t get back to this for another couple of weeks. At that point I’m running the performance test on another development machine and it’s running in 15 minutes! This is great! But it’s still running slow on the CI server. What’s different? So I start checking versions of software and all and everything matches up. What I do notice is differences in the drives. So I get another drive installed in the CI server so I can test different configurations of drives for the PostgreSQL data directory. Here are the results:

XFS (logbufs=8): ~4 hours to finish
ext4: ~1 hour 50 minutes to finish
ext3: 15 minutes to finish
ext3 on LVM: 15 minutes to finish
reiserfs: ~1 hour 50 minutes
ext3 barrier=1: ~15 minutes
ext4 nobarrier: ~15 minutes
jfs: ~15 minutes

So as you can see, the filesystem really makes a difference. Turns out the development machine was using ext3 on LVM and the CI server was running ext4. After posting to the postgres-performance mailing list about this, it turns out that I either get speed or safety. With ext3, if the power goes out, I could have a corrupted database, with ext4 this isn’t likely to happen. Given that I’m doing research here and if the power goes out during a test we have a lot bigger problems, I switched to ext3 and left it at that.

Now if I had just changed one thing (the database), rather than the database, the input data and the CI server setup around the same time, I probably would have caught this much sooner. It also would help to have my development systems setup with not only the same software, but the same filesystems too.