This project has moved and is read-only. For the latest updates, please go here.

Programming Industrial Strength Windows

« Previous: Wait a Moment Next: About Dialogs »

Chapter 12: File I/O

Figure 14 gives the highlights of how TextEdit handles files. On opening the file sample.txt, TextEdit immediately creates a copy of the file. In the figure, this copy is labeled “original copy of sample.txt,” but in fact, the name is a pseudo-garbage string generated by GetTempFileName, for example “te0123.tmp.”

Figure14.bmp

Figure 14: File I/O in TextEdit. The Document class is in charge of this.

The Document class handles all file I/O. The interface to the Document class was introduced in Chapter 5; it’s time we took a look at the implementation.

Drive Type

To begin with, all the Document constructor knows is the file name. The file is passed through getLongFileName before it is assigned to the m_strFileName member. Next, getDriveType figures out what kind of drive the file resides on. Actually, all we care about is whether the file resides on a CD-ROM or a floppy. If it’s on a CD-ROM, it’s obviously a read-only file; if it’s on a floppy, the floppy may be write-protected. In addition, the auto-save interval is much larger for floppies, since it needs to be spun up before we can access the disk (see Editor::getAutoSaveTime).

The getDriveType function is a wrapper for the Windows function GetDriveType. If fed a UNC file name, this function returns DRIVE_NO_ROOT_DIR, for some curious reason.

Figuring out whether a floppy is write-protected is less straightforward than you might think. The only way I know is to attempt to create a file. If this fails with an error code of ERROR_WRITE_PROTECT, the floppy is write-protected. Merely opening a file for writing is insufficient, as this succeeds even on a write-protected floppy. You don’t get an error until you actually try to write to the file.

Here is the isWriteProtectedDisk function from fileUtils.cpp:

bool isWriteProtectedDisk( LPCTSTR pszPath ) {

   assert( isGoodStringPtr( pszPath ) );

   PATHNAME szDir  = { 0 };
   _tsplitpath( pszPath, szDir, 0, 0, 0 );

   PATHNAME szTest = { 0 };

   SilentErrorMode sem;
   if ( !GetTempFileName( szDir,  _T( "te" ), 0, szTest ) ) {
      if ( ERROR_WRITE_PROTECT == GetLastError() ) {
         return true;
      }
   }
   verify( DeleteFile( szTest ) );
   return false;
}

SilentErrorMode

The SetErrorMode function controls who handles certain types of serious errors – you or Windows. In some cases, such as in the isWriteProtectedDisk function, I don’t want Windows do display a dialog box if a critical error occurs, so I must call SetErrorMode. Since I also want to restore the normal state of affairs afterwards, I have, as usual, created an exception-safe wrapper class, so that a simple variable declaration is sufficient. This class is named SilentErrorMode.

< Listing 49: SilentErrorMode.h >

Opening Files

The Document::openFile method takes care of actually opening the file. To begin with, it tries to open the file in read/write mode. If that doesn’t give us a valid file handle, it’s time to listen to what GetLastError has to tell us.

If the error code is ERROR_INVALID_NAME, the likely explanation is that the file name contains wild cards. If a wild card pattern makes it this far, it’s because no files matched the pattern. The openFile method displays a message box to that effect, then uses the openFile function (not the Document method of the same name) to let the user select a different file (or create a new one). If the user clicks OK, it tries to open the new file, otherwise it throws a CancelException.

That message box, by the way, could be dispensed with, since the explanation could just as well have been presented as part of the Open File dialog. Next version, for sure.

If we still don’t have a valid file handle, and the error code is ERROR_FILE_NOT_FOUND, we try to append the default extension and open the file again. If the new error code is different from ERROR_FILE_NOT_FOUND, the file name with extension replaces the old file name. (A more advanced approach would be to try different extensions in turn, e.g., first .txt, then .cpp, and so on.)

If we still don’t have a valid file handle, and the error code is ERROR_ACCESS_DENIED or ERROR_SHARING_VIOLATION, we try to open the file in read-only mode.

If we still don’t have a valid file handle, and the error code is ERROR_FILE_NOT_FOUND, we call the getNewFile function to invoke FileNotFoundDlg, depicted in Figure 8.

If we still don’t have a valid file handle, we give up, and throw a WinException.

We never have to think about closing files explicitly; the AutoHandle wrapper class (shown in Chapter 3) takes care of this.

Reading and Writing Files

TextEdit uses two different methods for reading and writing files. The copyFile function (fileUtils.cpp) uses the ReadFile and WriteFile functions, while everybody else use memory-mapped files. Memory-mapped files are handled by the FileMapping class, which takes care of all the tedious details.

< Listing 50: FileMapping.h>
< Listing 51: FileMapping.cpp>

Copying the Original

To get back on track: We’re still in the Document constructor, having just opened the file. Next, we use GetFileInformationByHandle to retrieve information about the file, then calls the createOrgCopy method to create a copy of the original contents. This method uses getTempFileName to – you guessed it – get a temporary file name, then uses copyFile to actually copy the bytes. The file containing the original copy is then made read-only and also hidden, so as not to confuse innocent users.

The last thing the Document constructor does is to call setRunning( 1 ). Unless we call setRunning( 0 ) at some point, this ensures that TextEdit will start editing the same document again after a reboot.

At this point, the Document object is initialized and ready to go, and its users can start calling the interesting methods, such as getContents, getOrgContents, save and update.

What happens to the Document object when the user opens a new file? One possibility is to re-initialize the existing object; another possibility is to delete the existing object and create a new Document from scratch. TextEdit creates a new Document object, but sometimes it might be more convenient to re-initialize the old one.

Conversion

Figure 14 has two operations labeled “convert.” The inbound conversion does two things – it translates, if necessary, Unicode to ANSI (or vice versa), and it converts, if necessary, Unix-style line separators (a single new-line character) into MS-DOS line separators (a carriage return followed by a new-line character). The outbound conversion does the opposite, again, if necessary.

There is a design principle that says,

“Don’t optimize before you’ve measured if it’s worth it.”

The obvious reason is that there may be no noticeable (or even measurable) difference, in which case all your work will be for nothing. A less obvious (but more important) reason is that optimized implementations tend to be more complex than straightforward ones, so they are harder to get right and harder to debug. Consider sorting: If I only need to sort a few items, I invariably implement a bubble sort. It’s small, I am confident that I can get it right the first time, and I don’t have to crack a book to do it. A Quicksort, on the other hand, is considerably more complex, and I would never tangle with it without getting out a book or a previous implementation. Furthermore, I would test the Quicksort a lot more carefully than I would the bubble sort, including all limit cases I could think of, since it would be less obvious that I’d gotten all the details right.

The conversions in Figure 14 are good examples of functions that really need optimization. It took me a while to notice this, as early versions worked like a charm on my test files, none of which exceeded a few tens of kilobytes. Once I started testing with larger files, the execution times shot through the ceiling.

Here is some early code from the outbound converter to remove carriage returns from a string:

if ( m_hasUnixLineFeeds ) {
   for ( int iChar = 0; 0 != pszNewContents[ iChar ]; ++iChar ) {
      if ( _T( '\r' ) == pszNewContents[ iChar ] ) {
         if ( '\n' == pszNewContents[ iChar + 1 ] ) {
            _tcscpy( pszNewContents + iChar, 
               pszNewContents + iChar + 1 );
         }
      }
   }
}
This snippet is obviously doing a lot of extra work, as it moves characters all the way to the end of the string each time a line separator is encountered. Still, the code is simple, and therefore difficult to get wrong. If the performance is good enough, as, indeed, it seemed to be, why jump through hoops when the only noticeable result will be an increased likelihood of bugs?

This code turned out to have a huge performance problem. I replaced the code above with the code below. On a 2.2 MB test file with 56,000 lines of text, the execution time of the loop fell from a good-sized coffee break (over 16 minutes) to 0.071 seconds. I had expected things to improve, but speeding things up by four orders of magnitude is definitely above average.

const int nBytesPerChar = 
   m_isUnicode ? sizeof( WCHAR ) : sizeof( char );
if ( m_hasUnixLineFeeds ) {
   LPCTSTR pszSrc = pszNewContents;
   LPTSTR  pszDst = pszNewContents;
   for ( ;; ) {
      const LPCTSTR pszCR = _tcschr( pszSrc, _T( '\r' ) );
      if ( 0 == pszCR ) {
         const int nLength = _tcslen( pszSrc ) + 1;
         memmove( pszDst, pszSrc, nLength * nBytesPerChar );
         break;                     //*** LOOP EXIT POINT
      }

      const int nLineLength = pszCR - pszDst + 1;
      memmove( pszDst, pszSrc, nLineLength * nBytesPerChar );
      pszDst += nLineLength;
      pszSrc += nLineLength;
      if ( _T( '\n') == pszSrc[ 0 ] ) {
         pszDst[ -1 ] = _T( '\n' );
         ++pszSrc;
      }
   }
}
As you can see, the code is considerably more complex. I wrote the first version practically without thinking; the second version required considerably more care. Write trivial code whenever you can get away with it. It is faster to write, it is quicker to test, it is easier to understand, and it is smaller.

“Keep things as simple as possible, but not simpler than that.”

There was a similar problem in the inbound converter – my first implementation used the String::insert operator to insert carriage returns, as this made for simple and obvious code. Unfortunately, the performance penalty I had to pay for this convenience was too high.

The inbound converter does one more thing – it detects and changes null characters. If any such are found (and changed), the m_bBinary member of Document is set, allowing us to warn the user (in onCreate in mainwnd.cpp).

Saving

A common approach to saving user files is to write the new contents to a new file. Once this new file is verified, the old file is deleted, and the new file is renamed. This approach, while careful of the contents, has a problem, especially on NTFS file systems: You may inadvertently change aspects of the original file, such as file attributes, original timestamps, owner, contents of named streams or security attributes. Even if you write the (non-trivial) code needed to save and restore this information, you’re still vulnerable to new features in new file systems.

For this reason, TextEdit simply overwrites the existing file. Not so simply, really; the Document’s save method is quite careful. Still, additional safety could have been bought by creating a backup copy of the existing file before saving.

The Document class actually has two methods that pertain to saving file contents: The update method, which takes an LPTSTR and a character count as parameters, and the save method, which takes a raw byte array and a byte count as parameters. The update method uses the save method to actually save the contents; the save method is also used for Abandon Changes, which does a raw copy of the contents.

Conflict of Interest

There are two classes of read-only files: Those that have the read-only attribute set, and those that deny write access for a different reason. The file may reside on a CD-ROM, or (if the file system supports this) you might lack the requisite permissions. It may reside on a read-only floppy or it may be opened by another process. The difference between these two classes is that, in the first case, you have the option to change the file’s status to writeable at any time.

In either case, though, a different process may pull the rug out from under us at any time, or, if the file resides on a floppy, the user might eject it without first consulting with TextEdit. TextEdit uses the KISS principle to deal with this: If someone has deleted the file behind our backs, we simply recreate it. If a different process has locked it down so that we can’t write to it, or if a floppy has been ejected, we ask the user for a new file name. This solution is not perfectly smooth, but it seems to work well enough in practice – the important thing is to ensure that we don’t lose user data. Alternative approaches include file locking and monitoring the disk for changes; a solution based on this kind of thing would be considerably more complex.

< Listing 52: Document.cpp>

Last edited Aug 29, 2008 at 12:42 PM by petterh, version 5