|
|
Data File
Compare Data Management Series http://www.simotime.com |
| When technology complements business | Copyright © 1987-2010 SimoTime Enterprises All Rights Reserved |
Reviewing the results obtained during a regression test that follows a system, application or programming change is one of the factors that drive a requirement for comparing data files. The scope of this effort is dependent of the type (or format) of file being compared and the complexity of the record structure within the file. Comparing the files is only half of the effort. What to do once an error (or non-equal) condition occurs can be a significant part of the effort.
This document is an introduction or overview of the data file comparison challenges created as a result of doing an application or data migration between a Mainframe System and a Linux, UNIX or Windows (LUW) System running a Micro Focus sub-system such as Enterprise Server, Application Server or Net Express. This white paper describes the technique for generating a COBOL program (or a set of programs) that will compare two data files in a Micro Focus or Mainframe environment. The format of the files being compared may be sequential (fixed or variable length records) or VSAM. When a difference occurs the content of each record is displayed in hexadecimal, ASCII and/or EBCDIC.
During the testing phase of an application migration between a Mainframe System and a Linux, UNIX or Windows (LUW) System it is a requirement to compare the results of parallel test runs. Using a programmatic approach for data file comparison can be a quick and accurate way to compare large volumes of data.
Defining the scope of what is really needed for a "data file compare" during a parallel application cycle is the first step. For discussion purposes and to better understand the requirements we will make the following assumptions.
| Item | Assumption |
| 1. | Identical jobs may be executed on a z/OS Mainframe and an LUW platform with Micro Focus Enterprise Server using JCL. |
| 2. | We have the ability to download the
mainframe files in their original EBCDIC-encoded formats. We are using File
Transfer Protocol (FTP) in binary mode to transfer the data files Note: Micro Focus Mainframe Access or MFA could simplify the transfer process. |
| 3. | We will be running Micro Focus Studio (or Net Express) and Enterprise Server on a Windows platform using an ASCII-encoded environment. |
| 4. | For the purpose of this effort we are talking about VSAM Data Sets and traditional Sequential files in Mainframe and Micro Focus formats. |
The objective is to define a process and use technology that will let us have the flexibility of doing data file compares on an IBM Mainframe or an LUW System with Micro Focus. The following is a list of considerations that will add to the complexity and scope of effort.
| Item | Consideration |
| 1. | Data Scrubbing. |
| 2. | Converting between EBCDIC and ASCII while maintaining mainframe numeric integrity |
| 3. | Fields within records that contain date and time stamp information |
| 4. | Reformatting the fields within a record (i.e. changing the size of a field or adding/deleing a field) |
| 5. | Data Files that contain "Report-Oriented" record structures with Header, Footer, Detail and Summary information |
A special "Thank you" to Larry Simmons of Micro Focus for providing much of the information that is presented in this series of white papers and sample programs.
This section provides a list of questions that will aid in determining the scope of effort for creating the process to compare files and identify differences.
The following is used to determine the basic requirements for the data file comparison effort with a focus on the level of detail required when a difference occurs.
| 1. | Is it sufficient to just know the name of the files that are different? | Y or N |
| 2. | Do you need to identify the specific records and positions within a record where a difference occurs? | Y or N |
| 3. | Will you need the technology and/or process to provide enough information to refer back to the field name where a difference occurs? | Y or N |
| 4. | Do you want to track and identify differences with record inserts and deletes? | Y or N |
| 5. | Is the capability to display or print hexadecimal data required? | Y or N |
| 6. | Is the capability to display or print ASCII or EBCDIC characters required? | Y or N |
| 7. | Where will the file compare be executed (Mainframe, Windows or UNIX)? | ______ |
The following is used to determine the number of files and the file types and characteristics.
| 1. | How many Key-Sequenced-Data-Sets (KSDS or Indexed Files) do you have to be compared? | ______ | |||||||||||||||
|
______ | ||||||||||||||||
| 2. | How many Sequential files do you have to be compared? | ______ | |||||||||||||||
|
|||||||||||||||||
| 3. | Do you have COBOL copy files that define the record layouts? | Y or N | |||||||||||||||
| 4. | Do you have Table Definitions as part of the record layouts? | Y or N | |||||||||||||||
|
|||||||||||||||||
| 5. | Do you use duplicate field names across group items (for example, FIELD-A of GROUP-01 and FIELD-A of GROUP-02)? | Y or N | |||||||||||||||
| 6. | Do you have packed (i.e. COMP-3) and binary fields (i.e. COMP) fields? | Y or N | |||||||||||||||
| 7. | Do you have signed, zone decimal fields? | Y or N | |||||||||||||||
| 8. | Do your files have Floating Point fields (i.e. COMP-1 or COMP-2)? | Y or N | |||||||||||||||
|
|||||||||||||||||
| 9. | Will you be using Line Sequential (i.e. ASCII/Text) files in the Windows environment? | Y or N | |||||||||||||||
| 10. | Will you be comparing report files? | Y or N | |||||||||||||||
| 11. | Will you be comparing source members? | Y or N | |||||||||||||||
| 12. | What is the encoding format of the data files ASCII, EBCDIC or Both? | A, E or B |
This section provides the basic information needed to get started with the data file comparison process.
| Item | Consideration |
| 1. | Do a Physical Comparison between two data files. The bytes within specified positions within a record must be equal at the bit level. |
| 2. | Do a Logical Comparison between two data files. The bytes within specified positions within a record must be equal at the character set level. |
| 3. | Do a Comparison with Compaction between two data files. The comparison is only done for significant bytes (usually non-space characters) within the specified positions within a record. |
| 4. | Accumulate totals for currency fields and/or other numeric fields within a record. |
| 5. | Accumulate record counts for bath files being compared. |
The file comparison technology that is available from SimoTime Enterprises executes on the Windows platform. However, this technology does not actually do the file compares but generates a COBOL program that performs the file compares. This COBOL program may be compiled and executed on an IBM Mainframe (z/OS or VSE), a Windows platform with Micro Focus or a UNIX platform with Micro Focus.
It is important to set up a directory structure to generate and compile the file comparison programs. The file comparison programs are generally treated as tools for use by the development and testing groups within an organization.
Therefore, the generated programs for file compares are kept separate from the mainstream source code. This is discussed in the "Possibilities and Considerations" section of this document.
Using the SimoTime technology for file compares is a two-step process. The generation and compilation of the COBOL source code is the first step and this is a one-time process. The second step is the repeatable task of defining the positions within the records within the files to be compared and then executing the compare program.
The SimoTime technology offers two alternates for defining the positions within the records that are going to be compared. The primary difference between the two alternatives is the point in the process where the positions to be compared are determined.
This type of file comparison program will simply read two files and do the comparison of positions within the records base on hard-coded values in the generated program. This type of program has the /COMPARE statements in the same control file (SYSCNTL) that is used to generate the COBOL source code. The positions to be compared will be determined at compile time and become part of the generated source code. The advantage of using this approach is that fewer parameters are required at execution time. The disadvantage of using this approach is that it requires the program to be regenerated and compiled if a user wants to change the positions within the records to be compared. A Windows Command Script is provided and is executed as follows.
C:\> ZAPSCOMP name-of-control-file.txt name-of-generated-cobol.cbl
The following is an example of a control (or specifications) file that will generate the COBOL source code that will compare the contents of two files based on the /COMPARE statements in a control file (SYSCNTL). The compare positions will be included in the generated COBOL source code.
/Dialect C2 /progid ITCOMPC1 /sysut1 name=ITCOMPD1 org=Indexed recfm=variable rlen=512 klen=12 kpos=1 /sysut2 name=ITCOMPD2 org=Indexed recfm=variable rlen=512 klen=12 kpos=1 * /DELTAMAX 250 EOF /KEYFIELD SYSUT1 pos 1 len 12 SYSUT2 pos 1 len 12 * /COMPARE SYSUT1 pos 1 len 94 SYSUT2 pos 1 len 94 * /DFORMAT ASC HEX EBC * /END
Once the COBOL source code has been generated it needs to be compiled. For this example we will use Micro Focus COBOL on a Windows platform to compile the program. We will compile to a .GNT for improved performance over .INT and use the ASSIGN(EXTERNAL) directive for mapping file names when we execute the program. The following is an example of a Windows command file (or batch file) that will set the environment and execute (or run) the program.
@echo OFF
rem *
set SYSLOG=c:\SimoDemo\TestLib1\DataWrk1\SYSLOGT1.DAT
set ITCOMPD1=C:\SimoDEMO\TestLib1\DataAsc1\ItemAsc1.DAT
set ITCOMPD2=C:\SimoDEMO\TestLib1\DataAsc1\ITEXPECT.DAT
if exist %SYSLOG% erase %SYSLOG%
rem *
run ITCOMPC1
rem *
pause
This type of file comparison program will read a control file (SYSUT3) containing /COMPARE statements at execution time and do positional comparisons within the records based on the /COMPARE statements submitted at execution time. The advantage of this approach is the compare positions are defined when the generated COBOL program is executed. The user may change the positions within the record to be compared without having to re-generate and compile the program. The disadvantage is that it is necessary to have a control file with compare statements at execution time.
WIP...
This section provides detailed information about the features provided by the SimoTime Technology. When doing file comparisons the default option should be to do a physical file compare with record counts reinforced by the accumulation and comparison of totals for currency fields.
For currency fields (and possibly other signed numeric fields) SimoTime recommends complementing the comparison process by using the accumulate function to accumulate totals for comparison. This has the added value of catching things like embedded spaces in signed packed fields.
The SimoTime technology offers two alternates for defining the starting position and length of strings within the records that are going to be compared. The primary difference between the two alternatives is the point in the process where the positions and length to be compared are determined.
This file comparison program will simply read two files and do the comparison of positions within the records base on hard-coded values in the generated program. This type of program has the /COMPARE statements in the same control file (SYSCNTL) that is used to generate the COBOL source code. The positions to be compared will be determined at compile time and become part of the generated source code. The advantage of using this approach is that fewer parameters are required at execution time. The disadvantage of using this approach is that it requires the program to be regenerated and compiled if a user wants to change the positions within the records to be compared.
This file comparison program will read a control file (SYSUT3) containing /COMPARE statements at execution time and do positional comparisons within the records based on the /COMPARE statements submitted at execution time. The advantage of this approach is the compare positions are defined when the generated COBOL program is executed. The user may change the positions within the record to be compared without having to re-generate and compile the program. The disadvantage is that it is necessary to have a control file with compare statements at execution time.
The SimoTime technology offers various types of comparison methodologies. The compare methodology used to do a data file compare will depend on the record content and structure.
A physical compare is the primary technique for comparing files. This compare is performed by specified positions within each record. The bit-pattern of the specified positions within each record within the range of positions must match to produce an equal result. To do a physical compare the following shows an example of a "/COMPARE" statement that would be included in the specifications file.
....:....1....:....2....:....3....:....4....:....5....:....6.... /COMPARE PHYSICAL SYSUT1 pos 1 len 303 SYSUT2 pos 1 len 303 or /COMPARE SYSUT1 pos 1 len 303 SYSUT2 pos 1 len 303
The "/COMPARE " keyword followed by space characters must be in columns 1 through 11. The remaining specifications must start in column 12. If the "COMPARE Type" (i.e. PHYSICAL) keyword is missing then "PHYSICAL" is assumed or used as the default value. Multiple COMPARE statements may be used. Each COMPARE statement has a maximum length of 1,024 bytes.
This is a singular test that does a one-to-one comparison of the bytes within the specified positions. The bytes must match at the bit level for an equal condition to occur.
For currency fields (and possibly other signed numeric fields) SimoTime recommends complementing the comparison process by using the accumulate function to accumulate totals for comparison. This has the added value of catching things like embedded spaces in signed packed fields.
A logical compare is an alternate technique for comparing files. This compare is performed by specified positions within each record. The bit-pattern or character-set of the specified positions within each record must match to produce an equal result. To do a logical compare the following shows an example of a "/COMPARE" statement that would be included in the specifications file.
....:....1....:....2....:....3....:....4....:....5....:....6.... /COMPARE LOGICAL SYSUT1 pos 1 len 303 SYSUT2 pos 1 len 303
The "/COMPARE " keyword followed by space characters must be in columns 1 through 11. The remaining specifications must start in column 12. To do a LOGICAL compare the "COMPARE Type" (i.e. LOGICAL) keyword must be specified. Multiple COMPARE statements may be used. Each COMPARE statement has a maximum length of 1,024 bytes.
This is a multiple test that does a one-to-two-possibility, byte-by-byte comparison. The logical compare will first test a byte for an exact match and if equal proceed to the next byte. If not equal the logical compare will then test the byte for a match by character set and if equal proceed to the next byte or if not equal set the mismatch (or non-equal) flag.
This compare methodology may be used to compare an ASCII-encoded file with and EBCDIC-encoded file without having to do a file conversion of one of the files. The first compare (or exact match) will catch the packed and binary fields. The second compare (or character set) will compare a logical ASCII character to a logical EBCDIC character.
This compare methodology will provide a high level of reasonability checking. However, there is a possibility that a signed decimal number that was converted improperly could be missed.
For currency fields (and possibly other signed numeric fields) SimoTime recommends complementing the comparison process by using the accumulate function to accumulate totals for comparison. This has the added value of catching things like embedded spaces in signed packed fields.
This compare methodology is intended to address a specific requirement that may exist when a user switches the report generator being used within an application. A new report generator or a custom program that replaces a utilitarian program may produce the same logical report but the actual data items may be shifted a few bytes to the left or right.
A compare using compaction is an alternate technique for comparing records within two files. This compare using compaction is performed by specified positions within each record. The bit-pattern of the non-space characters of the specified positions within each record must match to produce an equal result. To do a compare using compaction the following shows an example of a "/COMPACT" statement that would be included in the specifications file.
....:....1....:....2....:....3....:....4....:....5....:....6.... /COMPACT SYSUT1 pos 1 len 303 SYSUT2 pos 1 len 303
Multiple COMPACT statements may be used. Each COMPACT statement has a maximum length of 1,024 bytes.
The "OMIT" function provides a method to omit or bypass records from the compare process. The "/UT1OMIT" and "/UT2OMIT" statements are used to conditionally determine if a record should be omitted or bypassed by the comparison process. The most common use of the omit function is to bypass blank records from report-oriented files. To omit blank records from both files requires the following two statements in the specifications file.
....:....1....:....2....:....3....:....4....:....5....:....6.... /UT1OMIT if pos 2 len 132 EQ SPACES /UT2OMIT if pos 2 len 132 EQ SPACES
For the preceding two statements the action is not explicitly define and will default to "bypass record".
To conditionally omit more than a single record from the comparison process would require the following two statements.
....:....1....:....2....:....3....:....4....:....5....:....6.... /UT1OMIT if pos 1 len 1 EQ '1' bypass record +1 /UT2OMIT if pos 1 len 1 EQ '1' bypass record +1
For the preceding two statements the action must be explicitly defined as "bypass record". The "+nnn" defines the number of additional records to bypass.
The omit function is only supported by the compile time function.
This step is usually a one-time process (this example uses the option to define positions to be compared at execution time). To generate a file comparison program requires a specification file to be provided to the SimoTime technology that will generate the file comparison source code. The following is an example of the records within an ASCII/Text file that will be required.
*********************************************************************** * This is an example of the compare specifications to generate a * * Data File Comparison Program. This is used by SimoZAPS * * SimoTime Enterprises, LLC * * (C) Copyright 1987-2010 All Rights Reserved * * Web Site URL: http://www.simotime.com * * e-mail: helpdesk@simotime.com * *********************************************************************** /Dialect C2 /progid ITUTL3C1 /SYSUT1 name=SYSUT1 org=Indexed recfm=variable rlen=512 klen=12 kpos=1 /SYSUT2 name=SYSUT2 org=Indexed recfm=variable rlen=512 klen=12 kpos=1 /SYSUT3 name=SYSUT3 org=ASCII/Text recfm=variable rlen=080 * /DELTAMAX 250 EOF /KEYFIELD SYSUT1 pos 1 len 12 SYSUT2 pos 1 len 12 *COMPARE SYSUT1 pos 1 len 512 SYSUT2 pos 1 len 512 *COMPARE uses SYSUT3 ... * /DFORMAT ASC HEX EBC * /END
For detailed information about the statements in the preceding file refer to the SimoTime documentation for file compares at http://www.simotime.com/simozaps.htm#Compare2 .
The following is an example of a Windows command file (or batch file) that will set the environment and execute (or run) the program.
@echo OFF
rem *
set SYSLOG=c:\SimoDemo\TestLib1\DataWrk1\SYSLOGT1.TXT
set SYSUT1=C:\SimoDEMO\TestLib1\DataAsc1\ITEMASC1.Dat
set SYSUT2=C:\SimoDEMO\TestLib1\DataAsc1\ITEXPECT.DAT
set SYSUT3=c:\SimoDemo\TestLib1\DataTxt1\ITUTL301.TXT
if exist %SYSLOG% erase %SYSLOG%
rem *
run ITUTL3C1
rem *
if exist %SYSLOG% start Notepad %SYSLOG%
rem *
pause
For this example the information about the job execution and possible differences in the files being compared will be written to the SYSLOG file.
For this example the control file (SYSUT3) contains the following compare statement. Since the maximum record size for the file is 512 bytes this will compare the entire record content.
/COMPARE SYSUT1 pos 1 len 512 SYSUT2 pos 1 len 512
What to do when a not equal condition occurs can be challenging on a single platform but in today's environment with multiple platforms and a mix of encoding schemes (such as EBCDIC and ASCII) and numeric formats (such as PACKED, BINARY and SIGNED-ZONE-DECIMAL) the task can become time consuming and difficult.
When a difference is found the type of display or logging information is defined by the use of a /DFORMAT statement in the control file used during program generation. When a not equal condition is encountered the following is displayed to the screen and written to a logging file. The following /DFORMAT statement was used in this example and will cause the compare program to dump the possible ASCII Text, the hexadecimal values and the possible EBCDIC text.
/DFORMAT ASC HEX EBC
The RED shows the possible ASCII translation. The BLUE shows the possible EBCDIC translation. The BLACK shows the hexadecimal dump information on two lines (high nibble on line 1, low nibble on line 2). The GREEN shows reference information about each file. For example, the relative record number is displayed along with the position and length of the text string within the record that was compared. The MAROON row shows the positions that are equal (=) or not equal (#). The YELLOW vertical column highlights the differences. The following page shows sample output of the compare program when a difference occurs. Sample Output when a Difference Occurs.
*** 2005/04/01 08:57:37:40 Starting - Data File Content Comparison by SimoTime Enterprises, LLC *** 2005/04/01 08:57:37:41 SYSUT1.....Record Number(position:length) 000000001(00001:00094) *** 2005/04/01 08:57:37:41 ....:....1....:....2....:....3....:....4....:....5....:....6....:....7....:....8....:....9.... *** 2005/04/01 08:57:37:42 000000000001Distributor Cap ........Each ...2....i. *** 2005/04/01 08:57:37:42 3333333333334677766776724672222222222222222222222222222222220000000046662222222222220003900068 *** 2005/04/01 08:57:37:42 000000000001493429254F203100000000000000000000000000000000000000000051380000000000000002C0009C *** 2005/04/01 08:57:37:42 .....................?.../.........................................../........................ *** 2005/04/01 08:57:37:43 SYSUT2.....Record Number(position:length) 000000001(00001:00094) *** 2005/04/01 08:57:37:43 ....:....1....:....2....:....3....:....4....:....5....:....6....:....7....:....8....:....9.... *** 2005/04/01 08:57:37:43 000000000001Distributor Cap ...$....Each ...2....i. *** 2005/04/01 08:57:37:43 3333333333334677766776724672222222222222222222222222222222220002000046662222222222220003900068 *** 2005/04/01 08:57:37:44 000000000001493429254F203100000000000000000000000000000000000004000C51380000000000000002C0009C *** 2005/04/01 08:57:37:44 .....................?.../.........................................../........................ *** 2005/04/01 08:57:37:44 ===============================================================#===#========================== *** 2005/04/01 08:57:37:45 * *** 2005/04/01 08:57:37:45 SYSUT1.....Record Number(position:length) 000000002(00001:00094) *** 2005/04/01 08:57:37:45 ....:....1....:....2....:....3....:....4....:....5....:....6....:....7....:....8....:....9.... *** 2005/04/01 08:57:37:45 000000000002Rotor ........Each ........)\ *** 2005/04/01 08:57:37:46 3333333333335676722222222222222222222222222222222222222222220000000046662222222222220000800025 *** 2005/04/01 08:57:37:47 0000000000022F4F200000000000000000000000000000000000000000000000000051380000000000000009C0009C *** 2005/04/01 08:57:37:47 .............?.?...................................................../.......................* *** 2005/04/01 08:57:37:48 SYSUT2.....Record Number(position:length) 000000002(00001:00094) *** 2005/04/01 08:57:37:48 ....:....1....:....2....:....3....:....4....:....5....:....6....:....7....:....8....:....9.... *** 2005/04/01 08:57:37:49 000000000002Rotor ...$....Each ........)\ *** 2005/04/01 08:57:37:49 3333333333335676722222222222222222222222222222222222222222220002000046662222222222220000800025 *** 2005/04/01 08:57:37:49 0000000000022F4F200000000000000000000000000000000000000000000004000C51380000000000000009C0009C *** 2005/04/01 08:57:37:50 .............?.?...................................................../.......................* *** 2005/04/01 08:57:37:51 ===============================================================#===#========================== *** 2005/04/01 08:57:37:52 * *** 2005/04/01 08:57:37:53 Summary - Data File Content Comparison *** 2005/04/01 08:57:37:53 000000005 - Record count for ITCOMPD1 *** 2005/04/01 08:57:37:54 000000005 - Record count for ITCOMPD2 *** 2005/04/01 08:57:37:55 000000002 - Unequal count *** 2005/04/01 08:57:37:55 Finished - Data File Content Comparison by SimoTime Enterprises, LLC
The summary information shows the record counts and the number of unequal compares by record count.
This section provides additional detail about the process for generating, compiling and executing a data file comparison program. This discussion is limited to the Windows environment. However, the process is very similar for the mainframe and UNIX environments once the COBOL source code has been generated on a Windows system.
The following is a sample sub-directory structure for managing the assets used by the file compare process. The following directory structure is usually stored under a higher-level directory.
| Name | Description |
| CobCpy1 | This directory contains COBOL copy files. The copy files are not required for the data file comparison but will be required to generate the HTML documentation. Also, in an "application Migration" between a mainframe and a Windows platform it will be necessary to convert one of the files between EBCDIC and ASCII formats. |
| COBOL | This directory is used to store the generated COBOL source code. |
| Compares | This directory is used to store the specifications file and command files that are used to do the generation of the data file compare programs. |
| Converts | This directory is used to store the specifications file and command files that are used to do the generation of the data file conversion programs. |
| DataAsc1 | This directory is used to store data files that are ASCII-encoded. |
| DataEbc1 | This directory is used to store data files that are EBCDIC-encoded. |
| DataTxt1 | This directory is used to store data files that are ASCII/Text files. |
| DataWrk1 | This directory is used to store data files that are used as temporary or working files. |
| DOCS | This directory is used to store user documents. |
| HTML | This directory is be used to store the generated HTML documentation. |
| LOGS | This directory is used to write logging information about the code generation and compare processes. |
This is part of the basic requirements to ensure the files being compared will have an equal number of records. The SimoTime technology has an option to read to "End-of-File" or to "Quit" after a certain number of differences have been identified.
If the "End-of-File" option is used then differences will be written to the log file and when the difference count exceeds the maximum then differences will not be written to the log file but reading of the two files being compared will continue until end-of-file is reached and a records count for the total number of records read from each file will be provided along with a non-zero return code.
If the "Quit" option is used and the maximum number of differences is exceeded then a count of the number of records read before the program is terminated with a non-zero return code will be provided.
The SimoTime technology has the capability of tracking record inserts and deletes for Indexed files (or VSAM, Key-Sequenced-Data-Sets). This can also be done for sequential files if they are in sequence by an identified field that can be used as a key field.
The "OMIT" function provides a method to omit or bypass records from the compare process. The "/UT1OMIT" and "/UT2OMIT" statements are used to conditionally determine if a record should be omitted or bypassed by the comparison process. The most common use of the omit function is to bypass blank records from report-oriented files. To omit blank records from both files requires the following two statements in the specifications file.
/UT1OMIT if pos 2 len 132 EQ SPACES /UT2OMIT if pos 2 len 132 EQ SPACES
To conditionally omit more than a single record from the comparison process would require the following two statements.
/UT1OMIT if pos 1 len 1 EQ '1' bypass record +1 /UT2OMIT if pos 1 len 1 EQ '1' bypass record +1
The omit function is only supported by the compile time function.
The SimoTime Technology provides a variety of techniques to compare the content of two files. A "PHYSICAL" compare requires a match at the bit-pattern level. A "LOGICAL" compare requires a match of the bit-patterns or character-sets. A compaction technique may be used to compact the space characters and only include non-space characters within each record as part of the compare process.
A physical compare is the primary technique for comparing files. This compare is performed by specified positions within each record. The bit-pattern of the specified positions within each record must match to produce an equal result. To do a physical compare the following shows an example of a "/COMPARE" statement that would be included in the specifications file.
....:....1....:....2....:....3....:....4....:....5....:....6.... /COMPARE PHYSICAL SYSUT1 pos 1 len 303 SYSUT2 pos 1 len 303
If the "COMPARE Type" keyword is missing then physical is assumed or used as the default value.
A logical compare is an alternate technique for comparing files. This compare is performed by specified positions within each record. The bit-pattern or character-set of the specified positions within each record must match to produce an equal result. To do a logical compare the following shows an example of a "/COMPARE" statement that would be included in the specifications file.
....:....1....:....2....:....3....:....4....:....5....:....6.... /COMPARE LOGICAL SYSUT1 pos 1 len 303 SYSUT2 pos 1 len 303
A compare using compaction is an alternate technique for comparing records within two files. This compare using compaction is performed by specified positions within each record. The bit-pattern of the non-space characters of the specified positions within each record must match to produce an equal result. To do a compare using compaction the following shows an example of a "/COMPACT" statement that would be included in the specifications file.
....:....1....:....2....:....3....:....4....:....5....:....6.... /COMPACT SYSUT1 pos 1 len 303 SYSUT2 pos 1 len 303
The SimoTime technology has the capability of reading a COBOL copy file that defines a record layout and providing HTML documentation. The HTML documentation includes the field name, the relative position of the field within the data structure, the logical length of the fields (number of digits) and the physical field length (actual bytes of memory used).
When the comparison program is executed information is provided when a difference occurs by indicating the relative position within the record where a difference occurred. By referencing the HTML document it is a simple process to identify the field name where the difference occurred.
Sequential files that contain print line images can present a challenge when attempting to compare. Reports (or print-oriented files) usually contain multiple record types without a field that identifies the record type or print line. It is quite common that header information contains a date and possible time stamp and based on testing cycles these fields are rarely equal (especially the time field). Knowing when a record (print line) is a header, a footer, a detailed print line, a sub-total line or a total line can be difficult to determine.
It is very important to determine up-front how many files that are to be compared are print-oriented and what the structure is for these files. Once the file format is understood it is usually a straight-forward process to add logic to the generated COBOL program to adjust to the multiple record types and to bypass the date and time fields. For example, if a header line contains three lines of information the COBOL compare program may be modified to look for the "skip to line 1" character in position 1 of the 133 byte print line and then bypass the compare for three records (or the number of lines of header information).
In some cases the positioning of the text strings within a print line image may be shifted to the left or right. The content of the test strings is equal but a record compare is not equal because of the positional shift. The SimoTime technology provides an alternatives that uses a "Compaction" approach to do a left to right comparison and ignore the embedded spaces as part of the compare. By simply replacing the /COMPARE statement in the specifications file with a /COMPACT keyword the compaction/comparison will be used.
When comparing a sequential file that is downloaded from a Mainframe System (EBCDIC-encoded) with a file that is created on a Windows System (ASCII-encoded) it may be necessary to make adjustments for the difference in the ASCII and EBCDIC collating sequences. This is especially true if the field that determines the sequence of the file is alpha-numeric.
If the compare is being done on the Windows System then the file downloaded from the mainframe (via FTP in binary mode) will need to be converted to ASCII and then sorted prior to doing a compare.
It may be a requirement to accumulate batch totals for numeric fields and this is especially true for currency fields. The SimoTime technology allows a user to leverage various approaches for this requirement.
| Approach | Description |
| 1. | SimoTime provides the technology to do numeric checking and accumulation of totals for numeric fields based on a record layout and the field definitions provided in a COBOL copy file. |
| 2. | SimoTime technology generates COBOL source code that is compiled and then used to do the data file comparison. This COBOL source code may be easily modified to accumulate the batch totals or address very specialized requirements. |
| 3. | Also, the batch totals for currency fields may already be available in reports that are produced by the application. |
During an application migration it is always tempting to scrub the data (for example, replace leading spaces in numeric fields with zeroes). If the data is scrubbed this usually leads to a number of data file compare differences during the application testing or parallel testing phases of the migration effort. Extra time will be needed to manually process and validate this information.
The programs generated to do data file comparison may be compiled and executed in an ASCII or EBCDIC-encoded environment. The generated compare programs will make calls to two additional programs. The additional programs are SIMOHEX4 and SIMOLOGS. SimoTime provides two DLL's that will work in the ASCII-encoded, Net Express environment. If the generated file compare programs are to be used on the mainframe or in an EBCDIC-encoded environment with Micro Focus technology the SIMOHEX4 and SIMOLOGS programs must be compiled to execute in these EBCDIC-encoded environments. For these environments the OS/390 or later dialect should be used. The next challenge will be to get the file compare program to write information to the SYSLOG file. The following JCL will show what needs to be done.
//A2E02KJ1 JOB SIMOTIME,ACCOUNT,CLASS=1,MSGCLASS=0,NOTIFY=CSIP1 //* ******************************************************************* //* THIS PROGRAM IS PROVIDED BY: * //* SIMOTIME ENTERPRISES, LLC * //* (C) COPYRIGHT 1987-2010 ALL RIGHTS RESERVED * //* WEB SITE URL: HTTP://WWW.SIMOTIME.COM * //* E-MAIL: HELPDESK@SIMOTIME.COM * //* ******************************************************************* //* //* TEXT - Execute a program that compares two data files //* Author - SimoTime Enterprises //* Date - September 20, 2007 //* //* ******************************************************************* //* STEP 1, Delete the previously created SYSLOG file. //* //SYSLOGDT EXEC PGM=IEFBR14 //SYSLOG DD DSN=SIMOTIME.DATA.SYSLOG,DISP=(MOD,DELETE,DELETE), // STORCLAS=MFI, // SPACE=(TRK,5), // DCB=(RECFM=V,LRECL=1055,DSORG=PS) //* //* ******************************************************************* //* STEP 2, Allocate a new SYSLOG file. //* Note: The LRECL is four (4) bytes bigger than the logical record //* that is defined in the program that writes to SYSLOG. This //* allows for the four (4) byte Record Descriptor Word (RDW) //* that is appended to the front of each record. The program //* defines the records as varying in size from 64 to 1,051. //* Note: It is necessary to pre-allocate the SYSLOG file. The program //* that does the actual file compare will do an "OPEN EXTEND". //* If the file is not pre-allocated, an open error will be //* posted and the program will execute but will only display //* truncated information to SYSOUT. //* //SYSLOGCT EXEC PGM=IEFBR14 //SYSLOG DD DSN=SIMOTIME.DATA.SYSLOG,DISP=(NEW,CATLG,DELETE), // STORCLAS=MFI, // SPACE=(TRK,5), // DCB=(RECFM=V,LRECL=1055,DSORG=PS) //* //* ******************************************************************* //* STEP 3, Execute the compare program. //* Note: when a difference occurs the information will be written to //* the SYSLOG file //* //STEP1CPR EXEC PGM=JGBT1 //COMPARE1 DD DSN=SIMOTIME.DATA.DATKS02K,DISP=SHR //COMPARE2 DD DSN=SIMOTIME.DATA.DATKS02Z,DISP=SHR //SYSLOG DD DSN=SIMOTIME.DATA.SYSLOG,DISP=SHR //SYSOUT DD SYSOUT=* //
The file compare process can be automated through the use of Windows BAT or CMD files. The setup effort to do this is dependent on answers to the preceding questions and the application's sensitivity to the differences in the EBCDIC and ASCII collating sequences.
We are assuming the files will have the typical amount of date and time oriented fields that will have minor differences and that you may want to bypass or validity check these fields. Also, in some files there are filler items you may want to bypass in the compare process.
At one end of the spectrum the comparing of two sequential files that are in the same sequence and use the same encoding format is easy.
At the other end of the spectrum the comparing of two Key-Sequenced-Data-Sets that use different encoding formats with Alpha-Numeric keys and alternate indexes with variable length records and multiple records types with a mixture of packed and binary fields and wanting to track and identify differences with record inserts and deletes. This can be a real challenge.
The Data File Comparison section of the SimoZAPS Reference Manual is available on the Internet and provides additional information about data file comparison technology.
Permission to use, copy, modify and distribute this software for any commercial purpose requires a fee to be paid to SimoTime Enterprises. Once the fee is received by SimoTime the latest version of the software will be delivered and a license will be granted for use within an enterprise, provided the SimoTime copyright notice appear on all copies of the software. The SimoTime name or Logo may not be used in any advertising or publicity pertaining to the use of the software without the written permission of SimoTime Enterprises.
Permission to use, copy, modify and distribute this software for a non-commercial purpose and without fee is hereby granted, provided the SimoTime copyright notice appear on all copies of the software. The SimoTime name or Logo may not be used in any advertising or publicity pertaining to the use of the software without the written permission of SimoTime Enterprises.
SimoTime Enterprises makes no warranty or representations about the suitability of the software for any purpose. It is provided "AS IS" without any express or implied warranty, including the implied warranties of merchantability, fitness for a particular purpose and non-infringement. SimoTime Enterprises shall not be liable for any direct, indirect, special or consequential damages resulting from the loss of use, data or projects, whether in an action of contract or tort, arising out of or in connection with the use or performance of this software.
The self-study session for comparing data files uses a generated COBOL program to compare two data files. The session includes a hands-on exercise that does an actual data file comparison.
Note: You must be attached to the Internet to download a Z-Pack or view the list.
This document is part of the Data File Management Series of white papers that discuss the transferring, sharing, converting and comparing tasks required when moving or sharing data between different systems.
The hexadecimal dump of the parameter-buffer uses the same technique as describe in another SimoTime example that describes the dumping of a data string using COBOL. The name of the member that does the actual hexadecimal dump is called SimoDUMP. A copy file (PASSDUMP.CPY) is provided for defining the pass area.
The SimoZAPS Utility Program has the capability of generating a COBOL program that will do the conversion of sequential and VSAM (KSDS) files between EBCDIC and ASCII. SimoZAPS can also read a sequential file in EBCDIC format and create an ASCII/CRLF file or VSAM KSDS file in ASCII format. The conversion tables may be viewed or modified to meet unique requirements. The Hexcess/2 function provides the capability of viewing, finding or patching the contents of a file in hexadecimal.
This item will provide a link to an ASCII or EBCDIC translation table. A column for decimal, hexadecimal and binary is also included.
The following table is a list of white papers that provides more detailed information about the four common numeric formats used on an IBM Mainframe.
| Numeric Type | Description |
| Zoned Decimal | This document describes the zoned-decimal format. This is coded in COBOL as USAGE IS DISPLAY and is the default format if the USAGE clause is missing. |
| Packed Decimal | This document describes the packed-decimal format. This is coded in COBOL as USAGE IS COMPUTATIONAL-3 and is usually coded in its abbreviated form of COMP-3. |
| Binary | This document describes the binary format. This is coded in COBOL as USAGE IS COMPUTATIONAL and is usually coded in its abbreviated form of COMP. This may also be coded with the keyword BINARY. |
| Edited Numeric | This document describes the edited numeric format. This is coded in COBOL using an edit mask in the picture clause. An example would be PIC ZZZ.99+. |
Check out The VSAM - QSAM Connection for more examples of mainframe VSAM and QSAM accessing techniques and sample code.
This document provides a quick summary of the File Status Key for VSAM data sets and QSAM files.
Check out The SimoTime Library for a wide range of topics for Programmers, Project Managers and Software Developers.
To review all the information available on this site start at The SimoTime Home Page .
Check out The SimoTime Glossary for a list of terms and definitions used in the documents provided by SimoTime.
If you have any questions, suggestions or comments please call or send an e-mail to: helpdesk@simotime.com
We appreciate your comments and feedback.
Founded in 1987, SimoTime Enterprises is a privately owned company. We specialize in the creation and deployment of business applications using new or existing technologies and services. We have a team of individuals that understand the broad range of technologies being used in today's environments. This includes the smallest thin client using the Internet and the very large mainframe systems. There is more to making the Internet work for your company's business than just having a nice looking WEB site. It is about combining the latest technologies and existing technologies with practical business experience. It's about the business of doing business and looking good in the process. Quite often, to reach larger markets or provide a higher level of service to existing customers it requires the newer Internet technologies to work in a complimentary manner with existing corporate mainframe systems. Whether you want to use the Internet to expand into new market segments or as a delivery vehicle for existing business functions simply give us a call or check the web site at http://www.simotime.com
| Return-to-Top |
| Copyright © 1987-2010 SimoTime Enterprises All Rights Reserved |
| When technology complements business |
| http://www.simotime.com |
| Version 06.11.01 |