DATA FILES

            A data file is a collection of organized, related data
        used to store information.  Previously we've talked about
        files that were either program code or documents.  But files
        can also store data that is used in programs.  Sort of like
        data statements external to the program.  The amount of data
        that's available this way is greatly increased, because we're
        not restricted to the amount of memory in our computer.  CDOS
        extends the entire set of sequential file handling statements
        available in the built-in Microsoft BASIC to include files
        stored on disk, plus a new collection of more powerful random
        access file handling statements.


        FILE STRUCTURE: RECORDS AND FIELDS

            Let's look at what constitutes a data file.  A file's main
        subset of data is called a record.  In the programs and
        documents we've worked with in BASIC and TEXT, a record is
        equivalent to one line; that is, everything up to a carriage
        return.  Unlike BASIC listings and documents, in a data file
        each individual line or record contains the same type of
        information as every other record in the file.

            For example, in a name and address file, each record in
        the file might contain these items: last name, first name,
        street address, city, state, and zip code.  They appear in the
        same order in each record.  Thus the program written to use
        this data file reads individual records from the file and
        finds the desired information contained in each record.  Sets
        of data that contain the same type of information in each
        record are called fields.  Here's a peek at the structure of a
        typical data file...the rows are records and the columns are
        fields:

        last name  first name  street    city       state  zip

        Groovy     George      High      Somewhere  CA     97111
        Kent       Clark       Sooper    Metropolis NY     12002
        Gurdy      Hurdy       Mellow    Desert     NM     54923
        Hammer     Mike        Tough     New York   NY     10001

            Each row (or each line) corresponds to one record in the
        file.  So, the second record contains information on our
        flying friend in the flashy pants.  The first field in the
        second record contains "Kent".  The second field in the second
        record contains "Clark", and so forth.  Of course, this can be
        made about as simple or complex as your program needs
        require.




                                    - 7 -











        SEQUENTIAL FILES

            The only type of file handling provided by the built-in
        Microsoft BASIC in the Model 100/200 is sequential file
        handling.  There is just one way to find a particular record
        in a file: start at the beginning of the file and read each
        record in sequence until we come to the record that we want to
        look at.  Likewise, if we want to create a new file we can
        only start at the beginning and add records one after the
        other in sequence.  And if we just want to add one record to
        an already existing file, the only place it can be added is at
        the end of the file.  That's why it's called sequential...we
        can only do things in sequence, from beginning to end.
        However, there's still quite a bit that can be done with
        sequential file operations.  And CDOS enables us to do them
        with the Chipmunk disk drive, thereby expanding the amount of
        storage available and reducing the time and bother associated
        with files on cassette.


        CREATING A SEQUENTIAL FILE

            Let's create and use a sequential file.  How about a
        birthday and anniversary file?  That's something all absent-
        minded programmers could use.  Our file will consist of each
        person's name, his/her birthday or anniversary, a key of some
        kind to tell us whether it's a birthday or anniversary, and
        his/her favorite color, so we know what color socks or scarf
        to buy.  Now for decision time: how will each record look?
        Well, let's just set up a table like in the name and address
        example above.  That's basically what a file is...a table.

            Last Name    First Name    Key    Date    Color

            So far, so good.  But more decisions await.  How about
        data types?  That is, are the entries going to consist of
        numbers or strings?  And if numbers, are they integers, single
        precision, or double precision?  (Please refer to the BASIC
        manual section on Data Types.)  In our file, it'll be easiest
        to just make all the entries strings.  And we can use a B or
        an A as our key for birthday or anniversary.  Here are the
        variable names we'll use: LN$ for Last Name, FN$ for First
        Name, KY$ for Key, DT$ for Date, and CL$ for Color.

            We create the file the same way a RAM file or CAS file is
        created: we OPEN it for OUTPUT.  Keep in mind that if the
        filename we use is already present on disk in the folder we
        have open, it will be erased.

            10 OPEN "0:BDAY.DO" FOR OUTPUT AS 1




                                    - 8 -











            Now let's enter the data into the file.  A simple loop
        with a terminating condition of a blank line on an input
        prompt will do.  The data is placed into the file with the
        PRINT # statement.  Here's the program:

            10 OPEN "0:BDAY.DO" FOR OUTPUT AS 1
            20 LN$="":INPUT "Last Name";LN$:IF LN$="" THEN 100
            30 INPUT "First Name";FN$
            40 INPUT "Anniversary/Birthday Key  (A/B)";KY
            50 INPUT "Date";DT$
            60 INPUT "Color";CL$
            70 PRINT #1,LN$,FN$,KY$,DT$,CL$
            80 GOTO 20
            100 CLOSE #1
            110 END

            Notice how all the variables are in one PRINT # statement
        in line 70.  Why not just PRINT #1,LN$:PRINT #1,FN$:PRINT
        #1,KY$:...etc?  Because each variable printed will be followed
        by a carriage return, putting it on a new line, and thereby
        making it harder to find the end of each record when the file
        is read.  (And, it takes more code.)  As we'll see shortly,
        it's best to keep all the fields in one record within one line
        in the file.

            Here, we've used commas between the variables.  Load the
        file just created into TEXT and take a look at the data.
        Where did all those spaces between fields come from?  Well,
        when we use commas between variables in a PRINT # statement,
        BASIC pads the space between variables with a fixed number of
        spaces, unlike PRINTing to the screen (in which case it pads
        with a variable number of spaces up to a certain width.)
        Thus, the fields are very far apart and they waste a lot of
        space in the file.  This certainly isn't desirable.  We can
        get rid of the spaces by using semi-colons between variables:

            70 PRINT #1,LN$;FN$;KY$;DT$;CL$

            Try that and see what a record looks like now.  Now, there
        are no spaces at all between fields!  How do we tell where one
        field ends and another begins?  This becomes important when
        reading the file.  We'll want to be able to easily separate
        the fields and assign each to its own variable.  There are two
        main approaches to this problem: fixed-length fields and field
        delimiters.


        FIXED-LENGTH FIELDS

            One solution involves defining fixed-length fields.  If
        each field is a certain length, it's easy to separate the



                                    - 9 -











        fields when reading the file. Knowing how long each field is,
        a program can figure out where each begins and ends in
        relation to the beginning of the record.  This can be done
        with a LINE INPUT statement that assigns the contents of a
        record to one variable, followed by string manipulation
        statements that separate out individual fields and assign them
        to their own variables.  When deciding how many characters to
        allow for each field, take into account total record length as
        well as individual field length.  When reading a record with a
        LINE INPUT statement, the record length can't exceed 255
        characters.  This is BASIC's limit for a string variable.

            If we decide to allow ten characters for the Last Name
        (LN$) field, how do we handle input that is longer than or
        shorter than ten characters, as it most likely will be?  If
        it's longer, we need to decide whether to just truncate (chop
        off) the extra characters, or to warn the user of the
        excessive length and prompt for a shorter input.  If we choose
        to truncate long input, it seems reasonable to just take LN$ =
        LEFT$(LN$,10).  However, we can't handle input shorter than
        ten characters this way.  Short input must be padded with
        spaces to make LN$ exactly ten characters in length.  We might
        choose to add a statement something like
        LN$=LN$+LEFT$("10spaces",10-LEN(LN$)).  This works, but it
        involves a lot of coding for each variable.  There is a better
        and easier way!

            Fixed-length fields can be easily printed to a file with
        the PRINT USING statement.  The PRINT USING statement formats
        the output by printing listed variables at specified print
        positions of specified length.  The back-slash character, "\",
        is used to format strings.  Strings longer than the allowed
        length are truncated, and strings shorter than the allowed
        length are left-justified (printed at the left edge of the
        column) and padded with spaces.  (We may still want to check
        for excessive length and warn the user.)  Let's allow ten
        characters for the last and first names, and eight characters
        for the date and color.  The date will be in the format
        "mm/dd/yy".  Unfortunately, since with a formatted print each
        backslash also allots space for one character, the minimum
        number of characters we can use for a string is two, and
        therefore we have to use two characters for the key.  Here's
        our revised birthday file program:

            10 F$="\        \\        \\\\      \\      \"  ' format
               string
            20 OPEN "0:BDAY.DO" FOR OUTPUT AS 1  ' create new file
            30 LN$="":INPUT "Last Name";LN$:IF LN$="" THEN 100
               ' last name
            40 INPUT "First Name";FN$  ' first name
            50 INPUT "Birthday/Anniversary Key (A/B)";KY$  ' key



                                    - 10 -











            60 INPUT "Date";DT$  ' date
            70 INPUT "Color";CL$  ' favorite color
            80 PRINT #1, USING F$;LN$,FN$,KY$,DT$,CL$  ' formatted
               print to file
            90 GOTO 30  ' do it again
            100 CLOSE #1  ' finished
            110 END

            Now load the file produced by this program into TEXT and
        see how nicely the records and fields are set up.  This file
        should be a snap to read when we want to look for some data in
        it.

            The PRINT USING statement makes it easy to mix strings and
        numbers when printing to the file.  Just make sure that you
        use the correct formatting character and allow enough room in
        the field for the expected data.  (Numbers longer than the
        field width alloted are prefixed with a %.)  Here's an example
        that prints three fields: a name, an age, and a salary.  Age
        and salary are integer variables:

            10 F$="\            \###$$####"
            100 PRINT #1, USING F$;NM$,AG,SA

            Try this, and notice that salary appears in the file with
        a dollar sign due to the "$" format character in its field.
        Also, fields don't need spaces between because we know exactly
        where each starts and ends.  Numbers are actually printed to a
        sequential file as strings.  We could read the record produced
        by the above print statement with a single string variable,
        and use the techniques described in the next section for all-
        string data.  If, in the program that uses this data, it's
        more convenient to treat the data as numbers, the data that
        represent numbers will have to be converted with the VAL
        statement after reading from the file.

            Don't use the sample programs here as models of
        programming style!  For example, there's no entry error
        checking.  The user could easily enter anything at the key
        prompt and two characters of whatever he/she entered would be
        printed to the file.  Likewise, there's no check to see if the
        date is entered in the desired format.  Use these programs
        only as examples of file handling technique, and fill in good
        programming style where needed.










                                    - 11 -