Total Pageviews

Wednesday, 5 March 2014

Loading flat file using SQL Loader

SQL Loader is an Oracle tool used to load data from flat files into oracle tables. It allows the user to selectively load certain columns/rows. Although a programmer could duplicate the functionality of SQL Loader by writing his/her own load program, SQL Loader has the advantage of flexibility, ease of use, and performance. It is often used for migrating data from a non-Oracle source system to an Oracle database.
Two important components of SQL Loader are as follows
Data File: It is a flat file containing the data to be loaded in the oracle table. Each line contains one record to be loaded in the table. Each field in the datafile is separated by a separator character.
Below is a datafile containing 2 records, each row containing 3 columns separated by pipe (|).
1|Ninad|Kamerkar
2|Steven|Clau
Control File: It is a text file written in a language that SQL Loader understands. It tells SQL Loader where to find the data, how to parse and interpret the data, where to insert the data, and more. In other words, we can say that control file is a mapping file containing the mapping from the datafile to oracle table.

Below is the syntax for the control file.

LOAD DATA
INFILE <dataFile>
APPEND INTO TABLE <tableName>
FIELDS TERMINATED BY ‘<separator>’
(<list of all attribute names to load>)
Step-by-step approach for learning SQL Loader
Creating the Target Table
First we need to create a target oracle table in which SQL Loader will load the data from datafile. Execute below DDL on your oracle database schema.








CREATE TABLE Employee (
    Empno NUMBER(4),
    Name VARCHAR2(10),
    Age NUMBER(4),
    Dob DATE,
    Salary NUMBER(8,2),
    Emp_status VARCHAR2(5),
    Job_desgn VARCHAR2(20)
);

Creating the Data File

Below is the datafile that will be loaded into the Employee table. Save this file as Employee.LST. First line in the datafile is the header record. We need to ignore this first line while loading the file.


"NAME"|"AGE"|"SALARY"|"DOB"|"DEPARTMENT"|"JOB DESIGNATION"
"George"|"28"|"50000.30"|"02/02/1957"|"Soft. Dept."|"Soft. Eng."
"Steven"|"38"|"50000"|"08/11/1946"|"Accounts Dept"|""

Creating the Control File
Below is the control file for the above datafile. Save this file as Employee.ctl
















LOAD DATA
INFILE 'Employee.LST'
APPEND
INTO TABLE Employee
FIELDS TERMINATED BY '|'
ENCLOSED BY '"'
TRAILING NULLCOLS
(
    NAME CHAR
    ,AGE INTEGER EXTERNAL
    ,SALARY DECIMAL EXTERNAL
    ,DOB DATE "MM/DD/YYYY"
    ,EMP_STATUS CONSTANT 'A'
    ,EMPNO SEQUENCE( MAX, 1)
    ,SKIP_FIELD FILLER
    ,JOB_DESGN "NVL(:JOB_DESGN,'New Employee')"
)
Line 1: LOAD DATA specifies the beginning of the control file.
Line 2: INFILE specifies name of input files. An asterisk (*) following this keyword indicates data records are in the control file.
Line 3:This specifies the table operation to be performed. Below are all such optopns.
INSERT puts rows into an empty table.
APPEND adds rows to a table.
REPLACE and TRUNCATE delete all rows in a table and place the
current data records into the table.
Line 4: INTO TABLE specifies the target table to load data.
Line 5: FIELDS TERMINATED BY specifies the delimiter which is used in the datafile. In our example we are using pipe (|).
Line 6: ENCLOSED BY specify the character with which each field value is enclosed. In our example we are using double quotes (“). This is an optional parameter.
Line 7: TRAILING NULLCOLS tells SQL Loader how to handle short records with missing data. If the record ends before the field is found, then SQL Loader could either treat the field as null or generate an error. It uses the presence or absence of the TRAILING NULLCOLS clause to determine the action.
Line 9 – 16: These lines contain the actual mapping between the datafile fields and target table column. Here we say that the 1st field coming in the datafile should be put into Name column of employee table, 2nd field coming in the datafile should be put into Age column of employee table and so on.
Line 12: As we receive date in the 4th field, we specify the date format in which the date will come in the datafile.
Line 13: It may happen that you might not receive some data in the datafile and you may want to default some value into the target table column while inserting the record. CONSTANT keyword followed by constant value helps us to default column value.
Line 14: Here we use inbuilt function from SQL Loader called SEQUENCE(MAX, 1), which generates a sequence for the target table. It gets the max empno value from employee table and adds 1 to it to generate new sequence value.
Line 15: We might want to ignore some fields in the datafile. The keyword FILLER identifies the department fields (5th fields in datafile) that are not being loaded into the database. The name does not matter. You can change SKIP_FIELD to any desired value.
Line 16: Here we use oracle function NVL which check whether JOB_DESGN (6th field in datafile) is empty or not. If it is empty we use a default text ‘New Employee’.

Loading Your Data
Once the datafile and control file are ready, we are ready to do the loading. We use sqlldr command to load the datafile as below.


sqlldr <database_user>/<database_password>@<SID>
SKIP=1 ERRORS=1000 ROWS=100 CONTROL=Employee.ctl
LOG=Employee.log BAD=Employee.bad
Below is explanation for parameters passed to SQLLDR command.
Parameter Name
Description
SKIP
Number of logical records to skip (default: 0). SKIP=1 will skip the first record which is header.
ERRORS
Number of errors to terminate the load (default: 50)
ROWS
Number of rows in the conventional path bind array or between direct path data saves (conventional path: 64, direct path: all)
CONTROL
Control filename.
LOG
Log filename.
BAD
Bad filename. During the load process if sql loader finds records which could not be loaded, they are inserted in the file specified with this parameter. The record can then be reviewed to find out the problem. Conditions that may cause a record to be rejected include integrity constraint violation, datatype mismatches, and other errors in field processing.
DISCARD
Discard filename. SQL Loader provides options to selectively reject records based on special criteria. This criteria is defined in the control file as part of the when clause. If SQL Loader encounters a record that fails a specified when clause, the record is placed in a discard file.
DISCARDS
Number of discards to terminate the load (default: all).
When we execute the above command, data is loaded in the Employee table. If sql loader encounters any problem while loading records, a detailed log file containing summary of the load, any errors that occurred during the loading is also produced.
Below is the sample log file generated after successful load operation.
Advantages of using SQL Loader
1.       Has a powerful data parsing engine which puts little limitation on the format of the data in the datafile.
2.       Can load data from multiple data files during the same load session.
3.       Can load data into multiple tables during the same load session.
4.       Is character set aware (you can specify the character set of the data).
5.       Can selectively load data (you can load records based on the records’ values).
6.       Can manipulate the data before loading it, using SQL functions.
7.       Can generate unique sequential key values in specified columns.
8.       Can use the operating system’s file system to access the datafile(s).
9.       Does sophisticated error reporting which greatly aids troubleshooting.

Summary
Oracle SQL Loader processes the input data files according to the directions in a text file called the control file which specifies the names and locations of the source data files, the format of the data to be loaded and the data transformations to be performed when loading.
As the input files are processed, any records that do not pass format checks are written to the bad file and any records that do not meet the specified selection criteria are written to the discard file.
Records that pass both format and selection criteria are written to the specified target tables but they may still be rejected because of, for example, constraint violations in which case they are written to the bad file along with those records rejected for being invalid.

A log file containing a detailed summary of the load, including a description of any errors that occurred during the load is also produced.

No comments:

Post a Comment