![]() |
Data File Convert Data Management Series |
| When technology complements business | Copyright © 1987-2012 SimoTime Enterprises All Rights Reserved |
| The SimoTime Home Page |
Ever since the second computer architecture was introduced the task of data conversion in preparation for data migration and data sharing has been a never-ending process. Data conversions may be driven by business requirements or system requirements such as changes in system architectures. This document is an introduction or overview of the data file conversion aspects of an application migration between a mainframe system and a Windows system running a Micro Focus sub-system such as Enterprise Server, Application Server or Net Express. The following discussion will divide the data file conversion tasks into two categories.
| ||||
| Categories for the Data File Conversion Tasks |
The document and links to other documents will cover the following items.
| ||||||||||
| Topics of Focus for this Document |
The data conversion process should be a repeatable process with an audit or validation trail. The process should be executable as an automated, unattended process. Requiring operator input during the conversion process introduces an exposure point for error.
A special "Thank you" to Larry Simmons of Micro Focus for providing much of the information that is presented in this series of white papers and sample programs.
There are two categories of data conversion requirements based upon the target environment configuration.
| 1. | Migrate (or move) the application and then retire application on Mainframe | ||||
|
|||||
| 2. | Migate (or replicate) the application then Coexist/Complement the mainframe by sharing data and processes. | ||||
|
This section provides a list of questions that will aid in determining the scope of effort for creating the process to do data file conversions. It is important to provide platform flexibility as to where the conversion is done. The following questions should be answered at the beginning of the process to migrate an application and its associated data.
The following is used to determine the number of files and the file types and characteristics.
| 1. | How many Key-Sequenced-Data-Sets (KSDS or Indexed Files) will be converted? | ______ | |||
|
______ | ||||
|
______ | ||||
|
______ | ||||
|
______ | ||||
|
______ | ||||
|
______ | ||||
|
______ | ||||
| 2. | How many Sequential files do you have to be converted? | Y or N | |||
|
______ | ||||
|
______ | ||||
|
______ | ||||
|
______ | ||||
|
______ | ||||
|
______ | ||||
| 3 | Do you have COBOL copy files that define the record layouts? | Y or N | |||
|
Y or N | ||||
|
Y or N | ||||
|
Y or N | ||||
| 4. | Do you use duplicate field names across group items (for example, FIELD-A of GROUP-01 and FIELD-A of GROUP-02)? | Y or N | |||
| 5. | Do you have packed (i.e. COMP-3) and binary fields (i.e. COMP) fields? | Y or N | |||
| 6. | Do you have signed, zone decimal fields? | Y or N | |||
| 7. | Do your files have Floating Point fields (i.e. COMP-1 or COMP-2)? | Y or N | |||
|
Y or N | ||||
| 8. | Will you be using Line Sequential (i.e. ASCII/Text) files in the Windows environment? | Y or N |
The following is used to determine the basic requirements for the data file conversion effort.
| 1. | Is there a requirement to do the conversion on the mainframe? | Y or N |
| 2. | Is there a requirement to do the conversion on the server? | Y or N |
| 3. | Is there a requirement to do the conversion on the client machine? | Y or N |
| 4. | Is there a requirement to do the conversion during the transfer (i.e. download/upload) process? | Y or N |
| 5. | Is there a requirement to do the conversion at the file level? | Y or N |
| 6. | Is there a requirement to do the conversion at the record level? | Y or N |
| 7. | Is there a requirement for the conversion routine to be a callable module? | Y or N |
| 8. | Is there a requirement for the conversion routine to handle variable length records? | Y or N |
| 9. | Is there a requirement for the conversion routine to handle multiple record types? | Y or N |
| 10. | VALIGN="TOP" ALIGN="LEFT">Is there a requirement for the conversion process to be used or adapted to convert other data structures (i.e. IDMS, DataCom, Adabas, etc )? | Y or N |
The following is used to determine the effort of transferring the data files.
| 1. | File Transfer Protocol (FTP) will be used as the transfer medium. | Y or N | ||
| 2. | Micro Focus Mainframe Access will be used as the transfer medium. | Y or N | ||
| 3. | Machine Readable Media will be used as the transfer medium. | Y or N | ||
|
______ | |||
| 4. | Other Comments |
This section provides additional detail about the process for generating, compiling and executing a data file comparison program. This discussion is limited to the Windows environment. However, the process is very similar for the mainframe and UNIX environments once the COBOL source code has been generated on a Windows system.
If the requirement for data conversion is to do the conversion on a Windows client machine that is used for development and testing and the data is limited to sequential or indexed files then the Data File CONVerter included in Net Express (DFCONV) offers a cost effective and convenient solution. However, Micro Focus does not include the components for DFCONV into the various run time (or production) offering (Application Server and Enterprise Server) to allow full implementation in the production environments.
Note: Both DFCONV and the Data File Editor are delivered as technologies for a development environment running on a Windows System.
Note: Also, DFCONV is only available on Windows, it is not available on UNIX.
In many situations the requirement goes beyond just a simple file conversion that is performed on a Windows platform. The following may require more function than provided by DFCONV.
| ||||||||||||||||||
| Possible Requirements List for Data File Conversion |
The following is a reasonable three-step guideline for approaching a data file conversion effort.
| ||||||
| Three Possible Approaches to Data File Conversion |
With the exception of files with variable length records that are dynamically created at execution time a COBOL copy file that defines a record layout is usually available. If a copy file is not available a COBOL working storage definition can usually be "cut-and-pasted" to create a COBOL copy file.
With DFCONV the COBOL copy file or a working storage definition may be used as a record definition for the data file conversion.
With SimoTime technologies the COBOL copy file may be used to generate the COBOL source code that may be compiled and executed on a Mainframe system (MVS or VSE), a Wintel System with Micro Focus or a UNIX system with Micro Focus. Using a COBOL copy file is not an absolute requirement since an optional feature of the SimoTime Technologies provides for data conversion based on position within record.
If the requirement is to only convert the file format in order to FTP between systems then the REPRO function of IDCAMS may be the solution. The following is an example of how to use IDCAMS to convert a VSAM, KSDS to a flat, sequential file.
//IDCAMSJ2 JOB SIMOTIME,ACCOUNT,CLASS=1,MSGCLASS=0,NOTIFY=CSIP1 //* ******************************************************************* //* This program is provided by: SimoTime Enterprises * //* (C) Copyright 1987-2012 All Rights Reserved * //* Web Site URL: http://www.simotime.com * //* e-mail: helpdesk@simotime.com * //* ******************************************************************* //* //* TEXT - COPY (OR REPRO) A KSDS TO A SEQUENTIAL FILE //* AUTHOR - SIMOTIME ENTERPRISES //* DATE - JANUARY 01, 1989 //* // EXEC PGM=IDCAMS //SYSPRINT DD SYSOUT=A //SEQGET01 DD DSN=SIMOTIME.DATA.SEQGET01,DISP=(SHR) //KSDPUT01 DD DSN=SIMOTIME.DATA.KSDPUT01, // SPACE=(TRK,(10,1),RLSE), // DISP=(NEW,CATLG,DELETE), // LRECL=80,KEYOFF=0,KEYLEN=12,RECORG=KS //SYSIN DD * REPRO - INFILE(SEQGET01) - OUTFILE(KSDPUT01) /*
When converting a Sequential file that is downloaded from a Mainframe System (EBCDIC-encoded) to an Indexed file that is created on a Windows System (ASCII-encoded) it may be necessary to make adjustments for the differences in the ASCII and EBCDIC collating sequences. This is especially true if the field that determines the sequence of the file is alpha-numeric (if the key field is all numeric then sequencing should not be a problem).
If both the file format (Sequential to Indexed with an alpha-numeric key) and the file content (EBCDIC to ASCII) is being done in a single pass then an unordered load of the new Indexed file must be done to avoid getting an out-of-sequence error when adding new records into the new Indexed file.
An alternate index should not be created until after the file has been converted. Since we are changing the value (i.e. encoding format) of the alternate key (or index) then we should only create the alternate key after the data that makes up the alternate key has been converted.
Building the alternate index may be done using IDCAMS and JCL (Mainframe Express or ES/MTO with the Batch Facility) or with the Micro Focus BLDINDEX utility.
Each record type within a file should be treated as if the record type were a separate file. If a file contains five (5) different record types then a process must be implemented that will determine the record type and then pass control to the appropriate conversion routine for the record type. The technique used by the SimoTime technology greatly simplifies this effort. Since the SimoTime technology generates a callable routine based on a COBOL copy file it will be necessary to create a callable routine for each record type. Once this is done the user logic that determines the record type may be "cut" from an existing COBOL program that accesses the file and pasted into the SimoTime generated COBOL source code that performs the file I/O. The file I/O program will then call the appropriate conversion routine based on the "cut-and-pasted" user logic.
If a file contains variable length records created by using blank truncation techniques then the conversion process is similar to the process used to convert files with fixed length records.
If the variable length records are built dynamically by appending a variety of structured data segments to a base segment of a record then each segment must be treated as a separate entity and converted accordingly. In other words a separate routine should be generated for each segment of data and the user logic from an existing program should be used to determine the appended segment type, length and structure and do the segment conversion accordingly. Additional time should be allocated to convert this type of file content structure.
The IBM Mainframe has a variety of numeric formats or encoding schemes. This mix of numeric formats has been challenging for mainframe programmers since their inception. When migrating data the numeric formats require special consideration. The "BINARY" or "PACKED" numeric fields should not be converted (or translated) between EBCDIC to ASCII. These fields are supported on a Mainframe System and in a Micro Focus environment running on Windows or UNIX when using a mainframe dialect.
| Numeric Type | Description |
| Zoned Decimal | This document describes the zoned-decimal format. This is coded in COBOL as USAGE IS DISPLAY and is the default format if the USAGE clause is missing. Note: This is the slowest performer and uses the most storage space but is easiest to display on a screen or print to a printer. This encoding scheme may be unsigned (implied positive) or signed. This type of field will require special handling for the sign position when migrating from a mainframe (EBCDIC) to a Micro Focus (ASCII) environment. |
| Packed Decimal | This document describes the packed-decimal format. This is coded in COBOL as USAGE IS COMPUTATIONAL-3 and is usually coded in its abbreviated form of COMP-3. Note: The mainframe can perform arithmetic functions with this data format at the hardware (or micro-code) level. This type of encoding scheme was primarily used to save storage space. This encoding scheme may be unsigned (implied positive) or signed. When migrating from a mainframe (EBCDIC) to a Micro Focus (ASCII) environment this type of field should be left in its original format since this will be supported in the new environment. |
| Binary | This document describes the binary format. This is coded in COBOL as USAGE IS COMPUTATIONAL and is usually coded in its abbreviated form of COMP. This may also be coded with the keyword BINARY. Note: This format will save storage space but was primarily used for performance. Register arithmetic uses this format. This encoding scheme may be unsigned (implied positive) or signed. When migrating from a mainframe (EBCDIC) to a Micro Focus (ASCII) environment this type of field should be left in its original format since this will be supported in the new environment. |
| Edited Numeric | This document describes the edited numeric format. This is coded in COBOL using an edit mask in the picture clause. An example would be PIC ZZZ.99+. Note: This type of field is used for numbers that are to be displayed or printed and should be all text characters. This filed should be converted using standard conversion tables. |
| Floating Point | This format is used when a high level of precision is required or very large numbers are required. On the mainframe the default is to use the IBM 370 Floating Point Arithmetic. On Windows or UNIX using Micro Focus the default is to use the IEEE Standard for Floating Point Arithmetic. The IEEE standard provides a higher level of precision than 370. However, 370 provides for larger numbers by providing less precision. |
| numbug01 | The challenge with this program is that it is expected to process the various numeric items in the same manner as the mainframe. For example, a zoned-decimal field that contains leading spaces should not cause an ABEND (i.e. 163 error on Micro Focus) but should treat the leading spaces as zeroes and complete the arithmetic calculation. However, a packed-decimal field that contains non-numeric values would issue a S0C7 (referred to as a sock-seven) on the mainframe and should issue a 163 error in the Micro Focus environment. |
| numprt01 | Printing numeric fields, especially packed-decimal or binary (i.e. COMP-3 or COMP) requires special consideration. Also, signed, zoned-decimal fields will require special consideration. Most numeric fields will require some sort of editing before printing. This suite of programs provides examples of how a COBOL program may be used to properly print (or display) numeric fields. |
The following items should be considered when thinking about expanding packed or binary numeric fields.
| ||||||||||||
| Considerations for Expanding Numeric Fields |
Binary, Packed or Floating Point numeric fields may be used with "Record Sequential" files that are EBCDIC or ASCII encoded. The Binary, Packed or Floating Point numeric fields will cause a problem if attempting to include in a "Line Sequential" (or ASCII/Text) file.
This is part of the basic requirements to ensure the files being converted will have an equal number of records read from the input file and written to the output file. The SimoTime technology has an option to provide a read write count for an ordered load of an indexed file. When doing an unordered load it will provide a read, write and update count.
Management and/or the auditors/consultants should be involved in the data migration (and data file content conversion) process as early as possible. This will help ensure the process will meet the requirements and maintain the necessary level of data integrity at each step in the process.
When converting data files an existing definition (such as a COBOL copy file) of the data structures (or record layouts) should be used. This will avoid introducing errors in a process that requires the data to be defined by a new or proprietary format.
The data conversion process should be a repeatable process with an audit trail. The process should be executable as an automated, unattended process. Operator intervention during the conversion process should be considered as an exposure to introducing errors into the process.
Here is what we have done over the years when faced with converting data files between ASCII and EBCDIC. In the world of programming there are many ways to solve a problem. This documents and the links to other documents are intended to provide a choice of alternatives.
| ||||||||||
| The Evolution of Data File Conversion Techniques |
Today, we take the following approach.
| ||||||||
| Data File Conversion Techniques used in Today's Environments |
Anyone considering a data conversion should seek additional assistance from a consulting or programming services organization that has experience in this area.
Permission to use, copy, modify and distribute this software, documentation or training material for any purpose requires a fee to be paid to SimoTime Enterprises. Once the fee is received by SimoTime the latest version of the software, documentation or training material will be delivered and a license will be granted for use within an enterprise, provided the SimoTime copyright notice appear on all copies of the software. The SimoTime name or Logo may not be used in any advertising or publicity pertaining to the use of the software without the written permission of SimoTime Enterprises.
SimoTime Enterprises makes no warranty or representations about the suitability of the software, documentation or learning material for any purpose. It is provided "AS IS" without any expressed or implied warranty, including the implied warranties of merchantability, fitness for a particular purpose and non-infringement. SimoTime Enterprises shall not be liable for any direct, indirect, special or consequential damages resulting from the loss of use, data or projects, whether in an action of contract or tort, arising out of or in connection with the use or performance of this software, documentation or training material.
This section includes links to documents with additional information that is beyond the scope and purpose of this document. The first sub-section requires an internet connection, the second sub-section references locally available documents.
The following links will require an internet connect.
For information about data file management in a diverse or mixed systems environment refer to the Series of White Papers for non-relational data files.
A good place to start is The SimoTime Home Page for access to white papers, program examples and product information.
Explore The Numbers Connection in the SimoTime Library for more examples of programs and documentation that describe and demonstrate techniques for understanding and processing the various numeric field formats used in a mainframe environment.
Explore The ASCII and EBCDIC translation tables. These tables are provided for individuals that need to better understand the bit structures and differences of the encoding formats.
Explore The File Status Return Codes to interpret the results of accessing VSAM data sets and QSAM files.
Explore The Micro Focus Web Site for more information about products and services available from Micro Focus.
The following links should be accessible without an internet connection.
| Numeric Type | Description |
| Zoned Decimal | This document describes the zoned-decimal format. This is coded in COBOL as USAGE IS DISPLAY and is the default format if the USAGE clause is missing. Note: This is the slowest performer and uses the most storage space but is easiest to display on a screen or print to a printer. This encoding scheme may be unsigned (implied positive) or signed. This type of field will require special handling for the sign position when migrating from a mainframe (EBCDIC) to a Micro Focus (ASCII) environment. |
| Packed Decimal | This document describes the packed-decimal format. This is coded in COBOL as USAGE IS COMPUTATIONAL-3 and is usually coded in its abbreviated form of COMP-3. Note: The mainframe can perform arithmetic functions with this data format at the hardware (or micro-code) level. This type of encoding scheme was primarily used to save storage space. This encoding scheme may be unsigned (implied positive) or signed. When migrating from a mainframe (EBCDIC) to a Micro Focus (ASCII) environment this type of field should be left in its original format since this will be supported in the new environment. |
| Binary | This document describes the binary format. This is coded in COBOL as USAGE IS COMPUTATIONAL and is usually coded in its abbreviated form of COMP. This may also be coded with the keyword BINARY. Note: This format will save storage space but was primarily used for performance. Register arithmetic uses this format. This encoding scheme may be unsigned (implied positive) or signed. When migrating from a mainframe (EBCDIC) to a Micro Focus (ASCII) environment this type of field should be left in its original format since this will be supported in the new environment. |
| Edited Numeric | This document describes the edited numeric format. This is coded in COBOL using an edit mask in the picture clause. An example would be PIC ZZZ.99+. Note: This type of field is used for numbers that are to be displayed or printed and should be all text characters. This filed should be converted using standard conversion tables. |
| Floating Point | This format is used when a high level of precision is required or very large numbers are required. On the mainframe the default is to use the IBM 370 Floating Point Arithmetic. On Windows or UNIX using Micro Focus the default is to use the IEEE Standard for Floating Point Arithmetic. The IEEE standard provides a higher level of precision than 370. However, 370 provides for larger numbers by providing less precision. |
| numbug01 | The challenge with this program is that it is expected to process the various numeric items in the same manner as the mainframe. For example, a zoned-decimal field that contains leading spaces should not cause an ABEND (i.e. 163 error on Micro Focus) but should treat the leading spaces as zeroes and complete the arithmetic calculation. However, a packed-decimal field that contains non-numeric values would issue a S0C7 (referred to as a sock-seven) on the mainframe and should issue a 163 error in the Micro Focus environment. |
| numprt01 | Printing numeric fields, especially packed-decimal or binary (i.e. COMP-3 or COMP) requires special consideration. Also, signed, zoned-decimal fields will require special consideration. Most numeric fields will require some sort of editing before printing. This suite of programs provides examples of how a COBOL program may be used to properly print (or display) numeric fields. |
Explore The File Status Return Codes to interpret the results of accessing VSAM data sets and QSAM files.
Check out The SimoTime Glossary for a list of terms and definitions used in the documents provided by SimoTime.
This document was created and is maintained by SimoTime Enterprises.
If you have any questions, suggestions, comments or feedback please call or send an e-mail to: helpdesk@simotime.com
We appreciate hearing from you.
Founded in 1987, SimoTime Enterprises is a privately owned company. We specialize in the creation and deployment of business applications using new or existing technologies and services. We have a team of individuals that understand the broad range of technologies being used in today's environments. This includes the smallest thin client using the Internet and the very large mainframe systems. There is more to making the Internet work for your company's business than just having a nice looking WEB site. It is about combining the latest technologies and existing technologies with practical business experience. It's about the business of doing business and looking good in the process. Quite often, to reach larger markets or provide a higher level of service to existing customers it requires the newer Internet technologies to work in a complementary manner with existing corporate mainframe systems.
Whether you want to use the Internet to expand into new market segments or as a delivery vehicle for existing business functions simply give us a call or check the web site at http://www.simotime.com
| Return-to-Top |
| Data Management Series, Data File Convert |
| Copyright © 1987-2012 SimoTime Enterprises All Rights Reserved |
| When technology complements business |
| http://www.simotime.com |