Lewis' Blog Tales from the trenches of information technology

4Feb/145

Archiving JFS service logs on OS/2

Download PDF

Quick refresher:

IBM ported JFS from AIX to OS/2 for release with Warp Server for e-Business in 1999 1, and it ultimately made its way into the Warp 4 client a few months later. Anecdotal evidence (read: my own personal conversations with people who know) says that the port was pretty rough around the edges, and much of the utilities were left in barely-usable condition (many people to this day shy away from defragfs on OS/2). The hope (or so I'm told) was that a third party 2 would develop a better set of tools at some point down the road.

As with all things OS/2 at IBM, they pretty much "set it and forgot it." IBM finally (late 2000) GPL'd the source to facilitate porting to Linux 3, where development has continued. The initial port from AIX was not bootable. Thus, it was necessary to boot OS/2 from FAT or HPFS (or HPFS386, the 32-bit variant of HPFS) and then access however many JFS data partitions (type 35) as might be in the system. Bootable JFS did make it to eComStation, however, with the 2.0 beta release sometime around 2005 4. This is the default filesystem for eComStation today.

Closer to the issue at hand:

On OS/2 (eComStation) and on Linux, the JFS filesystem allocates a limited amount of space for the service log (on eComStation, this is fifty 4K pages, or 200KB 5). As the service log stores details of what a chkdsk (that's fsck, for you penguins out there) pass has done to the filesystem, this can be a bit small given the size of today's volumes (and JFS' filesystem size limits of 2TB on OS/2 and 4PB in the current Linux implementation, with 4 billion files maximum on OS/2 and apparently no upper limit on Linux). The rest of this article will focus on JFS as implemented in eComStation 6.

The real issue:

Setting the log size limit aside for a moment, the point of this post is that the 200KB service log size is what is available for up to two logs (the previous chkdsk pass and the current one), after which the current log becomes the previous one, and the next one becomes current. The older previous log is discarded. Keep this information in the back of your mind for future use.

Now, JFS being the journaling filesystem it is, it is common to have JFS.IFS in CONFIG.SYS loaded with the AUTOCHECK parameter 7 to just scroll through the journal at boot time. If the journal can be read, the system will replay the log and ensure that any transactions which should have been made to the filesystem were in fact, completed. However, if the journal cannot be read (bad crash), a long chkdsk is triggered, and this can take a considerable amount of time. In addition, it is possible for corruption to be present on the disk and for the "fast" chkdsk to complete showing no errors (because it is not checking for errors, only inconsistencies between the transaction log and what the log says should have been written to disk). Every now and again, a long chkdsk should be run while booted from another partition (what we still call a "maintenance partition" in OS/2 circles) or from other media (bootable CD or DVD). Of course the problem with this is that if corruption is found on the disk and corrected, it may be necessary to review the log in order to determine what has been zapped...er...fixed.

Every time a disk check is run, whether from AUTOCHECK or from a command line, the service logs rotate. Thus, if we run a manual check (or an AUTOCHECK is run and the journal is not readble) and a hundred or so inodes are "released" 8, shut the system down and reboot normally with AUTOCHECK set in CONFIG.SYS, the important log (detailing the hundred files we just lost) is now the previous log, and the one from AUTOCHECK (which will likely come up clean) is the current log.

Now let's just say that something has come up and we're not thinking of that previous JFS log and all of the important data it has in it from that excruciatingly long chkdsk pass, and further, the problem which caused the crash in the first place 9 has not been corrected, and the system goes down hard - again. What's going to happen? You guessed it: we're going to lose that first important service log, the log which is going to tell us how much email we lost (well, at least which POP3 folders we lost), how many spreadsheets were removed 10, how many text files are now gone, etc.

What can be done about this?

Well, for the moment, there's nothing to be done about the hard limit on the size of the service log. In a perfect world, it would be a function of the size of the volume, guesstimated on how many possible files might be present, and tunable with a setting at the time of format (well, it is on NSS, at least, and if only all filesystems were like NSS. No, the best thing we can do is dump the service log contents to a text file using the chklgjfs utility. Better yet, as all good admins, we want to create a batch file (this doesn't need to be fancy; we don't need REXX for this) to dump the log data at each boot. We can call the .CMD file from either STARTUP.CMD or even the Startup (or XWorkplace Startup) folder, and all it needs to do is this:

@echo off
chklgjfs c: >> c:\var\log\chklgjfs-c.log
chklgjfs j: >> c:\var\log\chklgjfs-j.log

(Obviously, adjust the drive letters to suit. The above is from my ThinkPad; my eComStation servers only have a single HPFS boot volume and one or more JFS volumes.)

Save above to something like archive_jfs_logs.cmd. Be sure to call it at each system start (not desktop restart, which is why the default Startup folder may not be a particularly good choice, but why STARTUP.CMD is, assuming you have a STARTUP.CMD).

Now, chklgjfs is another one of those quick and dirty, half-baked utilities from IBM. It has minimal command line help, and is not documented in any of the standard literature. Thus running it without parameters gets the terse response:

[[c:\os2]]chklgjfs
CHKLOG  Required parameter missing:  device specification
Usage: chklog [-L[:N|:P]] Device

(Yes, thanks for that, IBM.) However, since I've got "connections," you've come to the right place. My good friend, Steven Levine cites the following (which you can take to the bank):

Dump jfs log

  Usage: chklgjfs [-L[:N|:P]] VolLetter:

    -L          Dump content of new log
    -L:N        Dump content of new log (default)
    -L:P        Dump content of previous log

It's not a perfect solution (or a perfect utility!), but at least with this method (hopefully), you'll be able to preserve that precious log data a little longer 11. wink

  1. Wikipedia: JFS (file system)
  2. ISV, in IBM-speak, or Independent Software Vendor
  3. Wikipedia, loc. cit.
  4. Serenity Systems: eComStation 2.0 Milestone 1
  5. // 50 extra 4k pages for the chkdsk service log
    fsck_svclog_length = (50 << L2PSIZE) / aggr_block_size;
    #define L2PSIZE 12 /* log2(PSIZE) */
  6. Though some of this could certainly be adapted for use in Linux, my openSUSE workstation which boots from JFS is not handy at the moment, so double-checking my sources from a real live patient isn't possible.
  7. CONFIG.SYS Documentation Project: IFS Statements - JFS.IFS
  8. JFS-speak for "I have no idea what to do with these things, so we're going to lose the file: too bad!"
  9. ...assuming that long chkdsk happened following a crash and not as part of our monthly maintenance plan...
  10. I say "removed" because while a disk check may write bits and pieces to lost+found, digging through there is an arduous task, and chkdsk does not - cannot - respect the DELDIR variable which directs a normally running system to copy deleted files to a recovery area for possible retrieval.
  11. ...until the next hard crash and we discover that the file is cross-linked and we eed to release it - you did remember to run a backup after dumping the log data, right? right?
Comments (5) Trackbacks (0)
  1. I believe the JFS on OS/2 is JFS2 and not JFS1.  I have read conflicting versions of whether JFS2 was ported to OS/2 or built originally on OS/2 and actually ported to AIX.

    • I puzzled over this for some time when writing this article, Andy.

      According to the Mini-FAQ, I think the original JFS1 was ported, though it is surely debatable as to whether it was JFS2 which was ultimately open sourced and which bacame the basis for bootable JFS:

      Q1. What is the history of the source based use for the port of JFS for Linux.
      
      A1. IBM introduced its UNIX file system as the Journaled File System (JFS)
          with the initial release of AIX Version 3.1.  This file system, now
          called JFS1 on AIX, has been the premier file system for AIX over the
          last 10 years and has been installed in millions of customer's AIX
          systems.  In 1995, work began to enhance the file system to be more
          scalable and to support machines that had more than one
          processor. Another goal was to have a more portable file system,
          capable of running on multiple operating systems.
       
          Historically, the JFS1 file system is very closely tied to the memory
          manager of AIX.  This design is typical of a closed-source operating
          system, or a file system supporting only one operating system.
       
          The new Journaled File System, on which the Linux port was based, was
          first shipped in OS/2 Warp Server for eBusiness in April, 1999, after
          several years of designing, coding, and testing.  It also shipped with
          OS/2 Warp Client in October, 2000.  In parallel to this effort, some
          of the JFS development team returned to the AIX Operating System
          Development Group in 1997 and started to move this new JFS source base
          to the AIX operating system.  In May, 2001, a second journaled file
          system, Enhanced Journaled File System (JFS2), was made available for
          AIX 5L.  In December of 1999, a snapshot of the original OS/2 JFS
          source was taken and work was begun to port JFS to Linux.

      I guess Shaggy could clear up some of that confusion. One of these days, it would be nice to have us in full sync with the Linux JFS builds.

  2. The above script is something I wish I have had in place in the past.  I did add a line between each drive:

    echo ____________________________________________________________________ >> d:\var\log\chklgjfs-u.log

    for the correct log for the correct drive of course as a separation between this enry and the subsequent one.


Leave a comment

No trackbacks yet.