CSCI 160 Lecture 13 page 1

Text Files

It is often the case that you wish to "maintain state" between runs of a given program - that is, save some values from the current run of the program to be used in subsequent runs of the program. It is also frequently the case that you wish to run a program whose output will be later used in another program. In either of these cases, one standard method is to write the data to a file and, then, read it in as appropriate later.

The file is handy since the contents of disk drives (and other files on, for example, magnetic tape) do not disappear when the program ends or the computer is shut off. Instead they can be read later - even on another computer if properly connected.

We are going to deal with simple text files. These are the sort of files produced by NotePad on Windows machines or by emacs or nedit on Unix machines. Their contents are nothing but ordinary ASCII characters - those whose representations run from 00000000 to 01111111 in binary. Java has a multitude of file classes in its standard libraries, but we will restrict ourselves to using the two that are handiest for doing input and output to text files - BufferedReader and PrintWriter.

Both of these classes depend on the concept of buffering which is very efficient for reading for "block" devices like disks and some tape drives. A device is a "block" device when it has a minimum "chunk" of data that can be either read from or written to the device. The "chunk" on a disk drive is at least a sector since a sector is the minimum size piece of data that can be read from or written to a disk at one time. (Some operating systems - like DOS and Windows - have even larger chunks called clusters that are the minimum they use for various reasons. These clusters are some multiple of a sector in size, the size depending on various factors.)

The problem with these devices is that for a given program, the amount of data read by any given read is usually smaller than that "chunk". (Sometimes it is larger, but buffers work well in that case, too.) A buffer is a chunk of memory that is set aside and is normally that "chunk" size. Any read to the disk will then bring in the appropriate "chunk" and store it in the buffer. The program will do all its reading from the buffer.  It will continue to read until it is done or until it needs more data than is stored in the buffer. At that point, the operating system will suspend the program and start a read from the disk to refill the buffer. Once the buffer is filled, the operating system will restart the program and it can then continue to execute. Notice that the buffer acts as sort of a balancing agent between what the program needs and what the disk or tape can supply.

The previous paragraph is buffering at its simplest. However, very few modern operating systems would supply just one buffer to a program. It is more normal to supply at least two. With two, the system does something called double buffering. When the file is opened at the beginning, the operating system causes a pair of reads which will fill both buffers. Your program waits to read until at least the first buffer is filled and, then, starts executing. Your program reads until it exhausts the first buffer. If the second buffer has been filled, your program continues to execute on the second buffer while the operating system fills the first buffer with another read. When you have finished the second buffer, you read from the first again, circling around and around. The advantage is that often your program and the reading of the other buffer proceed at the same time and, therefore, will execute more efficiently - your program will have to wait less often. (Many systems use multiple buffering to get the same effect.)

Both BufferedReaders and PrintWriters gain efficiency from this.


Lynn Ziegler,