Micro Focus File Formats
When technology complements business   Data Management Series
Copyright © 1987-2008  SimoTime Enterprises, LLC  All Rights Reserved  http://www.simotime.com

 
Introduction Version 07.10.05
  File Format Overview
  Sequential Files
 
  Sequential Files, Line Sequential or ASCII/Text
  Sequential Files, Record Sequential
  Sequential Files, Record Sequential, Fixed Length Records
  Sequential Files, Record Sequential, Variable Length Records
  Indexed Files
  Generation Data Groups (or GDG)
  Possibilities & Considerations 
 
  Possibilities & Considerations, Report Files
  Possibilities & Considerations, Numeric Field Types
  Possibilities & Considerations, Micro Focus Directives
 
  Directives, File Access Methods, use of Record Buffers before an Open
  Directives, Numeric Formats
  Summary
 
  Software Agreement and Disclaimer
  Downloads and Links to Similar Pages
  Glossary of Terms
  Comments or Suggestions
  About SimoTime

Introduction
(Next) (Previous) (Table-of-Contents)

This document describes the various data file formats used by Micro Focus. The intent is to provide an overview of the different file structures. For a detailed descriptions of the various file structures or file systems refer to the Micro Focus documentation.

This document is intended to provide information to individuals that are migrating an application, migrating data files or sharing data files between an IBM Mainframe and a Windows or UNIX system using Micro Focus technologies,

A special "Thank you" to Larry Simmons of Micro Focus for providing much of the information that is presented in this series of white papers and sample programs.

File Format Overview
(Next) (Previous) (Table-of-Contents)

This section describes various file formats and the file handling environments used by Micro Focus. For the development environment the base file handler may be the preferred environment. For the Test and Production environment the Micro Focus File Share running on a separate server may be the preferred environment for Indexed Files.

Microsoft has various formats for disk formatting. The FAT (or File Allocation Table) is the older technology used prior to Windows 2000 and many external storage devices are shipped with the initialization being FAT. The FAT format has a limit of 2 gigabytes per file. The FAT32 format raised the limit to 4 gigabytes per file. The NTFS format removed the 4 gigabytes limit.

Many external USB storage devices ship with FAT. This is the lowest common denominator for moving data between platforms and is supported by Windows and UNIX systems. Micro focus files that are smaller than 2 gigabytes may easily be moved across the three disk formats. Files greater than 2 gigabytes in size and less than 4 gigabytes may be easily moved between FAT32 and NTFS. Files that are larger than 4 gigabytes require the NTFS format. Micro Focus,

Sequential Files
(Next) (Previous) (Table-of-Contents)

The sequential files may be divided into two groups, Line Sequential and Record Sequential. The Line Sequential files are associated with ASCII/Text files and usually contain variable length records with a record separator value (this may be a one or two byte value) between each record. Depending on how a Line Sequential file is created it could have fixed length records with spaces as the trailing pad characters.

A Record Sequential file may contain fixed records or variable length records. The record sequential files with fixed length are a series of concatenated records or data strings of a predefined length. A record sequential file with variable length is a series of concatenated records or data strings of varying lengths preceded by a record descriptor word (RDW) that defines the length of the record.. A header record is placed at the start of the file when the file is created.

The two types of sequential files are discussed in more detail in the following sections of this document. Micro Focus,

Sequential Files, Line Sequential or ASCII/Text
(Next) (Previous) (Table-of-Contents)

Line Sequential files are ASCII/Text files. Depending on how the file is created it may have fixed or variable length records. The separation of the individual records is maintained by the use of delimiter bytes between the records. For Windows this is usually a two byte value consisting of a Carriage-Return and Line-Feed (or CRLF), the hexadecimal notation is x'0D' and x'0A'. For UNIX systems this is usually a one byte value consisting of a Line-Feed (or LF), the hexadecimal notation is x'0A'.

The data strings that are included in a record should be display or print text using the ASCII encoding format. Hence the name ASCII/Text files. These files should not contain packed or binary data strings. Micro Focus,

Sequential Files, Record Sequential
(Next) (Previous) (Table-of-Contents)

A Record Sequential file may contain fixed records or variable length records. The record sequential files with fixed length are a series of concatenated records or data strings of a predefined length without record separator values between each record. The first byte of the first record starts at the first byte of the file.

A record sequential file with variable length is a series of concatenated records or data strings of varying lengths without record separator values between each record. Each record is preceded by a record descriptor word (RDW) that defines the length of the record that follows. A Micro Focus header record is placed at the start of the file when the file is created. Micro Focus,

Sequential Files, Record Sequential, Fixed Length Records
(Next) (Previous) (Table-of-Contents)

Data is stored in records with predefined, fixed lengths concatenated into a contiguous string of data from the beginning to the end of the file. The file size must be a multiple of the record length. Record separator byes are not used. There is no Micro Focus header record. The record length is determined by the record definition in the COBOL program. When using MFE or ES/MTO Batch Facility and the file utility programs the record length is determined from the catalog or the LRECL value included with the JCL. The Data File Converter (DFCONV) uses a filename.PRO file to store the file information. If the .PRO file does not exist the user is prompted for the file information and this information is then stored in a .PRO file.

This type of file may be EBCDIC or ASCII encoded and may contain packed, binary or floating-point fields.

Sequential Files, Record Sequential, Variable Length Records
(Next) (Previous) (Table-of-Contents)

Data is stored in records with variable lengths. A Record Descriptor Word (RDW) of two (2) bytes prefixes each record. A 128-byte header record is at the beginning of the file. This header record is followed by a combination of the RDW and record that may have trailing low-values to keep records aligned on a word boundary. The records are then concatenated into a contiguous string of data from the beginning to the end of the file.

The mainframe variable length files that are transferred via FTP from the mainframe need special handling in order to get the RDW information to precede the variable length records. The RDW for mainframe variable length files is four (4) bytes with the first two (2) bytes being the record length. There may be an optional Block Descriptor Word (BDW) may also be present in a mainframe variable length file. The BDW is four (4) bytes in length. If the same FTP syntax that is used for fixed-length records is used transfer variable length records the file will be transferred without the RDW information. Special FTP statements are needed to have the RDW information included.

This type of file may be EBCDIC or ASCII encoded and may contain packed, binary or floating-point fields.

Indexed Files
(Next) (Previous) (Table-of-Contents)

Indexed (or Key-Sequenced) files are the Micro Focus file type used for VSAM, KSDS (Key-Sequenced-Data-Sets). The IDXFORMAT(2) is the default that has two files (normally a filename.DAT and a filename.IDX for the data and index). Indexed files may have an optional alternate index or multiple alternate indices with or without duplicate keys (the primary key must be unique). The following table is a summary of the indexed file types supported by Micro Focus.

Format Description
2 Micro Focus Level II format.
3 Micro Focus indexed file format.
4 An optimized form of Micro Focus indexed file format, for fast duplicate key handling.
7 RLIO format indexed files.
8 Large indexed.
9 Indexed with single key, non-duplicate, key ordered records.
11 Files are formatted as mainframe print files. Remember to use the ADV compiler directive.
Note: this is a line sequential file, not an indexed file.

The IDXFORMAT(8) is another popular format and is required if the files are larger than 2 gigabytes. With the IDXFORMAT(8) a single file contains both the data and the indices (primary key and alternate keys). The primary key must be unique and the alternate keys may contain duplicates.

This type of file may be EBCDIC or ASCII encoded and may contain packed, binary or floating-point fields. When converting this type of file between ASCII and EBCDIC it is necessary to read the original file and create a new file since the primary key value will be changing. The alternate indices for the new file should be rebuilt after the file has been created in its ASCII-encoded format. Additional time should be allocated to provide for configuring and managing index files under the control of Micro Focus File Share. Additional information for index files may be found in the Micro Focus documentation.

Generation Data Groups (or GDG)
(Next) (Previous) (Table-of-Contents)

GDG's are sequential files. At the start of job execution the system resolves all the relative GDG references in the job stream. The best example of this is when you create (+1) in one step you need to reference it as (+1) in subsequent steps of that job.

However, there is no renaming of the actual dataset names done at EOJ. Given this scenario:

1. LIMIT(3) is in effect
2. G0001V00, G0002V00, and G0003V00 exist at start of job
3. (+1) is created during the job (i.e. G0004V00)

At EOJ, the G0001V00 dataset will "roll off" and be disassociated with the GDG. It may be deleted as well depending on whether or not SCRATCH or NOSCRATCH has been specified for the GDG. So the datasets associated with the GDG at EOJ will actually be G0002V00, G0003V00, and G0004V00 (keeping in mind that G0001V00 may still exist and be in the catalog). The highest numbered G????v00 datasets that exist at EOJ will remain associated with the GDG as governed by the LIMIT.

For more information about GDG's refer to the Downloads and Links to Similar Pages section of this document.

Possibilities & Considerations
(Next) (Previous) (Table-of-Contents)

Micro Focus provides a wide range of directives to support many of the mainframe file, record and field formats along with processing techniques that coincide with mainframe behavior.

Possibilities & Considerations, Report Files
(Next) (Previous) (Table-of-Contents)

A typical report file on the Mainframe is an EBCDIC-encoded, record sequential file of 133 byte records with the first byte being used for carriage control. With Micro Focus Mainframe Express (MFE) this format is maintain.

With Micro Focus Net Express a typical report file is an ASCII-encoded, line sequential file with embedded carriage control characters.

Since the files are all printable text it is easy to convert the encoding between EBCDIC and ASCII before attempting a file compare. However, the difference in the file formats will make it difficult to do a simple file compare. To solve this problem the report files need to be created in the mainframe format. To do this it is necessary to use the FILETYPE(11) and ADV compiler directives. This will cause Net Express to create the report files using the mainframe format. After converting one of the files between EBCDIC and ASCII the files may be easily compared.

Now, we must address the task of actual printing or storing in a repository. If the files are to be printed then we must recompile the programs without the FILETYPE(11) and ADV directives or obtain a utility program from Micro Focus that will read a mainframe formatted file and write a PC or UNIX line sequential file with embedded carriage control.

On the mainframe many reports are never printed (or rarely printed) but are placed in a repository for online viewing. This repository is usually managed by a separate, third-party software package. Many of the report management software vendors have a Windows or UNIX version of their software.

Note: this section addresses batch printing. If a terminal printer under the control of CICS is used then the print format maps to the terminal printer specifications and may require additional attention when migrating between the EBCDIC and ASCII environments.

Possibilities & Considerations, Numeric Field Types
(Next) (Previous) (Table-of-Contents)

During a data migration from a mainframe system to a Windows or UNIX system many of the files may contain records with a variety of fields (or data strings) of various numeric formats that are used by the mainframe or the COBOL programming language. The following links provide additional information about numeric field formats.

Numeric Type Description
Zoned Decimal    This document describes the zoned-decimal format. This is coded in COBOL as USAGE IS DISPLAY and is the default format if the USAGE clause is missing.
Packed Decimal    This document describes the packed-decimal format. This is coded in COBOL as USAGE IS COMPUTATIONAL-3 and is usually coded in its abbreviated form of COMP-3.
Binary       This document describes the binary format. This is coded in COBOL as USAGE IS COMPUTATIONAL and is usually coded in its abbreviated form of COMP. This may also be coded with the keyword BINARY.
Edited Numeric    This document describes the edited numeric format. This is coded in COBOL using an edit mask in the picture clause. An example would be PIC ZZZ.99+.

Possibilities & Considerations, Micro Focus Directives
(Next) (Previous) (Table-of-Contents)

This section descibes various Micro FOcus compiler directives that may be required to control program behavior in the Windows or UNIX environments in a manner compliant with the compiler options and subsequent executuion on the Mainframe System.

Directives, File Access Methods, use of Record Buffers before an Open
(Next) (Previous) (Table-of-Contents)

If a program attempts to access the record buffers defined in the FD section of a COBOL program this will result in a 114 error return code. Normally, this memory area is not allocated and available until after the file is opened. This causes the 114 error message and can be very time consuming to diagnose. To make the record buffers available before a file open a compiler directive (NOHOSTFD) must be used.

Directives, Numeric Formats
(Next) (Previous) (Table-of-Contents)

Management (i.e. processing, storage and retrieval) of the various numeric formats has been and continues to be a challenge on the mainframe. When transferring data files that contain the various numeric formats from the Mainframe to a Windows or UNIX platform the challenges are transferred along with the files. Micro Focus (on the Windows and UNIX platforms) offers a number of COBOL compiler directives to help deal with the challenges of managing the various numeric formats.

NUMPROC is a mainframe compiler option. When NUMPROC(MIG) is in effect, the compiler generates code that is similar to that produced by OS/VS COBOL. This option can be especially useful if you migrate OS/VS COBOL programs to IBM Enterprise COBOL for z/OS.

Use NUMPROC(MIG) to aid in migrating OS/VS COBOL programs to Enterprise COBOL. When NUMPROC(MIG) is in effect, the following processing occurs at the mainframe:

1. Preferred signs are created only on the output of MOVE statements and arithmetic operations.
2. No explicit sign repair is done on input.
3. Some implicit sign repair might occur during conversion.
4. Numeric comparisons are performed by a decimal comparison, not a logical comparison.

For Micro Focus the use of a mainframe dialect DIALECT(ENTCOBOL) will set other compiler options for mainframe compatibility. For example IBMCOMP and NOTRUNC directives will be included when a mainframe dialect is use. This will maintain numeric integrity and size for COMP or BINARY fields. Next the use of the directives HOSTNUMMOVE HOSTNUMCOMPARE SIGNFIXUP HOSTARITHMETIC CHECKNUM will emulate the mainframe NUMPROC(NOPFD)

The sequence in which the directives are specified is also important since some directives will set other directives. For example, the DIALECT directive that specifies a mainframe dialect will set CHARSET(EBCDIC). If the desired encoding is ASCII then the CHARSET(ASCII) directive must follow the DIALECT directive.

Directive
or
Function
Description
DIALECT The DIALECT(ENTCOBOL) should be the first directive specified. This will ensure a mainframe dialect.
CHARSET If the target environment is ASCII-encoded then use the CHARSET(ASCII) directive must follow the DIALECT directive that specifies a mainframe dialect.
IBMCOMP In word-storage mode every data item of USAGE COMP or COMP-5 occupies either two bytes or a multiple of four bytes.
NOTRUNC Truncate in binary to the capacity of the allocated storage, on all nonarithmetic stores into COMP, BINARY and COMP-4 items
DE-EDIT The DE-EDIT”1” directive may be required if the program being compiled does moving between edited numeric fields.
NUMPROC(PFD) Given X'sd', where s is the sign representation and d represents the digit, when you use NUMPROC(PFD), the compiler assumes that the sign in your data is one of the three preferred signs:
1. Signed positive or 0: X'C'
2. Signed negative: X'D'
3. Unsigned or alphanumeric: X'F'
The following is a list of the directives used to emulate the NUMPROC(PFD) environment.
HOSTNUMMOVE HOSTNUMCOMPARE NOSIGNFIXUP HOSTARITHMETIC CHECKNUM
NUMPROC(NOPFD) On the mainframe when the NUMPROC(NOPFD) compiler option is in effect, the compiler accepts any valid sign configuration. The preferred sign is always generated in the receiver. NUMPROC(NOPFD) is less efficient than NUMPROC(PFD), but you should use it whenever data that does not use preferred signs might exist. If an unsigned, external-decimal (or zoned-decimal) sender is moved to an alphanumeric receiver, the sign is unchanged (even with NUMPROC(NOPFD)).
The following is a list of the directives used to emulate the NUMPROC(NOPFD) environment.
HOSTNUMMOVE HOSTNUMCOMPARE SIGNFIXUP HOSTARITHMETIC CHECKNUM
DEFAULTBYTE"ii" The "ii" is a decimal value.
Set to DEFAULTBYTE"32" immediately by CHARSET"ASCII".
Set to DEFAULTBYTE"00" immediately by CHARSET"EBCDIC", MS, IBM-MS or PC1.
If you want to specify an EBCDIC space use DEFAULTBYTE"64".
Note 1: If the default byte is to be changed and the CHARSET directive is used then DEFAULTBYTE must follow the CHARSET directive.
Note 2: If a COBOL program processes a table and the index (or subscript for the table) is not initialized this will not produce an abnormal termination on the mainframe. However, if the program is ported to a Micro Focus environment with ASCII encoding the index will be initialized to spaces and this will cause a problem with an invalid subscript error. Changing the default byte to "00" for the ASCII encoded environment will correct the problem. Correcting the program to initialize the index (or subscript) would be a better solution to the problem.

Summary
(Next) (Previous) (Table-of-Contents)

The purpose of this document if to provide a quick overview of the file formats provided by Micro Focus on a Windows platform. For more information refer to the Micro Focus Web Site at http://www.microfocus.com

Software Agreement and Disclaimer
(Next) (Previous) (Table-of-Contents)

Permission to use, copy, modify and distribute this software for any commercial purpose requires a fee to be paid to SimoTime Enterprises. Once the fee is received by SimoTime the latest version of the software will be delivered and a license will be granted for use within an enterprise, provided the SimoTime copyright notice appear on all copies of the software. The SimoTime name or Logo may not be used in any advertising or publicity pertaining to the use of the software without the written permission of SimoTime Enterprises.

Permission to use, copy, modify and distribute this software for a non-commercial purpose and without fee is hereby granted, provided the SimoTime copyright notice appear on all copies of the software. The SimoTime name or Logo may not be used in any advertising or publicity pertaining to the use of the software without the written permission of SimoTime Enterprises.

SimoTime Enterprises makes no warranty or representations about the suitability of the software for any purpose. It is provided "AS IS" without any express or implied warranty, including the implied warranties of merchantability, fitness for a particular purpose and non-infringement. SimoTime Enterprises shall not be liable for any direct, indirect, special or consequential damages resulting from the loss of use, data or projects, whether in an action of contract or tort, arising out of or in connection with the use or performance of this software.

Downloads and Links to Similar Pages
(Next) (Previous) (Table-of-Contents)

You may download or view the complete list of SimoTime Examples at  http://www.simotime.com/sim4dzip.htm .

Note: You must be attached to the Internet to download a Z-Pack or view the list.

This suite of sample programs describes how to define a  Generation Data Group (GDG) . Once the GDG is defined the creation of a Generation Date Set (referred to as a generation or GDS) within the group is discussed. The COBOL program is written using the COBOL/2 dialect but works with COBOL for MVS and COBOL/370.

This item will provide a link to  an ASCII or EBCDIC translation table . A column for decimal, hexadecimal and binary is also included.

Check out  The VSAM - QSAM Connection  for more examples of mainframe VSAM and QSAM accessing techniques and sample code.

This document provides a quick summary of the  File Status Key  for VSAM data sets and QSAM files.

Check out  The SimoTime Library  for a wide range of topics for Programmers, Project Managers and Software Developers.

To review all the information available on this site start at  The SimoTime Home Page .

Glossary of Terms
(Next) (Previous) (Table-of-Contents)

Check out  The SimoTime Glossary  for a list of terms and definitions used in the documents provided by SimoTime.

Comments or Suggestions
(Next) (Previous) (Table-of-Contents)

If you have any questions, suggestions or comments please call or send an e-mail to: helpdesk@simotime.com

We appreciate your comments and feedback.

About SimoTime Enterprises, LLC
(Next) (Previous) (Table-of-Contents)

Founded in 1987, SimoTime Enterprises is a privately owned, Limited Liability Corporation located in Novato, California. We specialize in the creation and deployment of business applications using new or existing technologies and services. We have a team of individuals that understand the broad range of technologies being used in today's environments. This includes the smallest thin client using the Internet and the very large mainframe systems. There is more to making the Internet work for your company's business than just having a nice looking WEB site. It is about combining the latest technologies and existing technologies with practical business experience. It's about the business of doing business and looking good in the process. Quite often, to reach larger markets or provide a higher level of service to existing customers it requires the newer Internet technologies to work in a complimentary manner with existing corporate mainframe systems. Whether you want to use the Internet to expand into new market segments or as a delivery vehicle for existing business functions simply give us a call or check the web site at http://www.simotime.com


Return-to-Top
Copyright © 1987-2008 SimoTime Enterprises, LLC  All Rights Reserved
When technology complements business
http://www.simotime.com
Version 07.10.05