Wednesday, 15 August 2012

NoClassDefFoundError and ClassNotFoundException errors when running standalone Java programs

This post is intended as a collection of various cases related to Java CLASSPATH issues that cause NoClassDefFoundError and ClassNotFoundException errors that are often mysterious and sometimes quite difficult to troubleshoot. 
The addressed cases go beyond trivial causes of missing jar files or mistyped package names.

Case I - A standalone Java program is packaged into a jar file. The JAR includes a seemingly correct manifest specifying the main class and the program classpath as in the sample below:

MANIFEST.MF file:


Manifest-Version: 1.0
Main-Class: com.vtesc.batchjobs.daily.txprocessor.Launcher
Class-Path: . ./txprocessor.jar ./lib/spring.jar ./lib/commons-logging.jar ./lib/ojdbc6.jar d:/common/lib/dbaccess.jar


The txprocessor.jar file is correctly built by an Ant build script and has a totally correct structure and contents:



Everything seems to be perfectly fine but when run, it fails with classloader errors (Java version "1.6.0_29"):


D:\batchjobs\txprocessor>java -jar txprocessor.jar
Exception in thread "main" java.lang.NoClassDefFoundError: com/vtesc/batchjobs/daily/txprocessor/Launcher
Caused by: java.lang.ClassNotFoundException: com.vtesc.batchjobs.daily.txprocessor.Launcher
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: com.vtesc.batchjobs.daily.txprocessor.Launcher. Program will exit.

Cause:
The returned error displaying the "Could not find the main class" message can be actually quite misleading as it may make you to suspect some problem with the main class. However, in this case it has absolutely nothing to do with the main class. The cause of the error is actually in the Manifest file on the Class-Path: line:

d:/common/lib/dbaccess.jar


Hard-coding an absolute path to a library is a very bad practice in itself but specifying the path without indicating a protocol causes the total failure. By default, the Java platform uses the java.net.URLClassLoader class for loading program classes. As its name implies, the URLClassLoader uses another system class, java.net.URL for specifying the resource location. The URL class complies with a number of RFC documents and it requires the file:/ protocol to be specified for file system resources.

The problem can be corrected by prefixing the absolute path to the dbaccess.jar file with the file:/ protocol as shown below:


Manifest-Version: 1.0
Main-Class: com.vtesc.batchjobs.daily.txprocessor.Launcher
Class-Path: . ./txprocessor.jar ./lib/spring.jar ./lib/commons-logging.jar ./lib/ojdbc6.jar file:/d:/common/lib/dbaccess.jar


Yet, a better way would be not to use absolute paths. Case in point, using the Windows specific path here will confine this Java program to the Windows platform which goes against the Java platform-independence principle.
But we live in a world which is far from perfect and sometimes one finds himself forced to follow requirements even if they go against best practices and common sense.

Thursday, 9 August 2012

Samples of control files for Oracle SQL Loader

This is intended as a quick reference to provide a set of templates for creating SQL Loader control files for most frequently used scenarios.

Sample 1:
Loading data from a TAB-delimited file: FIELDS TERMINATED BY X'09'
The sample also demonstrates the following techniques:
  • skip lines: OPTIONS (SKIP=1)  instructs the SQL Loader to skip the first line (the number specifies the number of lines)
  • auto-generate values for a primary key column: SEQUENCE(MAX, 1)
  • load data by adding more rows without truncating the existing data: APPEND
  • load into DATE type columns: "TO_DATE(:TX_DATE,'MM/DD/YYYY')"

-- ============================================================
-- stocktx.ctl
-- SQLLDR Control File for data loading to STOCKTX table.
--
-- The input file has the TAB-separated format with a single line per table row.
--
-- To run (sample):
--   sqlldr batchjob/<pwd>@oradb control=stocktx.ctl errors=1000 log=logs/stocktx_load.log

OPTIONS (SKIP=1) load data
  infile 'data/stocktx_data.tab'
  badfile 'logs/stocktx_data.bad'
  discardfile 'logs/stocktx_data.discard'
  APPEND
  into table portfolio.stocktx
  FIELDS TERMINATED BY X'09'
  trailing nullcols
(
    TX_ID        SEQUENCE(MAX, 1),
    EXCHANGE,
    SYMBOL,
    QTY,
    PRICE,
    TX_DATE      "TO_DATE(:TX_DATE,'MM/DD/YYYY')"
)

Sample 2:
Loading data from a multicharacter-delimited file (^|^) file: FIELDS TERMINATED BY '^|^'
The sample also demonstrates the following techniques:
  • delete existing data prior to the load: TRUNCATE
  • specify the data file name on the command line: data=addr.dsv
  • specify log files pathnames via command line parameters (as opposite to hard-coding in the control file).
  • evaluate the source values and transform on a condition, e.g if the Postal Code is "N/A", replace it with NULL: ZIP_CODE "decode(:ZIP_CODE, 'N/A', '', :ZIP_CODE)"
  • apply the current time/date to a DATE column: SYSDATE 
  • CONSTANT - allows to set a column to a constant value; this value is hard-coded into the control script file, not from the input data file. The value is treated as a character string; SQLLDR will attempt to convert it to the column type if required.
-- ============================================================
-- addr.ctl
-- SQLLDR Control File for data loading to ADDR table.
--
-- The input file the '^|^' sequence of characters to separate values.
--
-- To run (sample):
-- sqlldr batchjob/@oradb control=addr.ctl errors=100 log=logs/addr_load.log bad=logs/addr.bad discard=logs/addr.discard data=addr.dsv
--
-- Note: the logs directory needs to be created prior to running the script.

OPTIONS (SKIP=0)

load data
  TRUNCATE
  into table ADDR
  FIELDS TERMINATED BY '^|^'
  trailing nullcols
(
    ADDR_ID SEQUENCE(MAX, 1),
    CITY,
    STREET_NUM,
    STREET_NAME,
    STREET_DIR,
    ZIP_CODE    "decode(:ZIP_CODE, 'N/A', '', :ZIP_CODE)",
    MODIFIED_ON SYSDATE,

    BATCH_ID    CONSTANT 1
)

This is a table structure for the sample above:
CREATE TABLE ADDR (
  addr_id      NUMBER(10)     NOT NULL,
  city         VARCHAR2(40)   NOT NULL,
  street_num   NUMBER         NOT NULL,
  street_name  VARCHAR2(40)   NOT NULL,
  street_dir   VARCHAR2(4)    NOT NULL,
  zip_code     VARCHAR2(7),
  modified_on  DATE           NOT NULL,
  batch_id     NUMBER,
--  
  CONSTRAINT addr_id_pk PRIMARY KEY (addr_id)
)
/
Sample 3:
SQLLDR control file to load data from a comma-separated file (,) where field values may be enclosed in double-quotes (").


Control file explanation:
  • FIELDS TERMINATED BY ',' - specifies that the fields are delimited by comma.
  • OPTIONALLY ENCLOSED BY '"' - specifies that field values may (or may not) be enclosed in double quotes, for example: "2269 Lake Shore Blvd. W., Toronto, ON"
    This is often required if the data can contain characters that are used for field separation, the comma character in this particular example.
  • SEQUENCE(COUNT, 1) - this is usually used to generate unique values for the table primary key. The COUNT parameter specifies that the 1st value to be the number of records in the table before the load plus 1.
  • --  INTO TABLE ADDR.STREET_VALUES_BAK - a commented out line; it is handy during the development phase when the control file can be re-used to load data into different tables.
  • "UPPER(:ALT_STREET_NAME)" - convert the values to the upper case.
  • CREATE_DATE "TO_DATE(TO_CHAR(SYSDATE, 'YYYYMMDD'), 'YYYYMMDD')" - this is useful to load only the date part of the current time into DATE type columns.
    SYSDATE returns the current time which is formatted into a string containing Year, Month, and Day which in turn converted back to DATE type.
  • BATCH_ID EXPRESSION "(SELECT MAX(BATCH_ID) FROM ADDR.BATCH_ID WHERE TABLE_NAME = 'STREET_ALTERNATIVE')" - useful to insert a constant value for each data load, for a example a Batch ID or Run ID.
    A separate table is required, of course, to maintain load data IDs. A new record can be inserted into this table from a shell script prior to invoking the SQLLDR script. See a DDL and INSERT statement below.
-- =============================================================
-- street_alternative.ctl
-- Loading alternative street names.
-- SQLLDR Control File for data loading into ADDR.STREETS_ALTERNATIVE table.
--
-- The input file is |-separated in double quotes.
-- sqlldr vit1@devdb1 control=street_alternative.ctl errors=1 log=logs/street_alternative.log bad=logs/street_alternative.bad discard=logs/street_alternative.discard data=street_alternative.txt
--

OPTIONS (SKIP=1)

LOAD DATA
  APPEND
  INTO TABLE ADDR.STREET_ALTERNATIVE
--  INTO TABLE ADDR.STREET_VALUES_BAK
  FIELDS TERMINATED BY ','
  OPTIONALLY ENCLOSED BY '"'
  TRAILING NULLCOLS (
  REC_ID            SEQUENCE(COUNT, 1),
  STREET_ID,
  ALT_STREET_NAME   "UPPER(:ALT_STREET_NAME)",
  CREATE_DATE       "TO_DATE(TO_CHAR(SYSDATE, 'YYYYMMDD'), 'YYYYMMDD')",
  BATCH_ID          EXPRESSION "(SELECT MAX(BATCH_ID) FROM ADDR.BATCH_ID WHERE TABLE_NAME = 'STREET_ALTERNATIVE')"
)



DDL for the tables used in the sample #3 above:
CREATE TABLE ADDR.STREET_ALTERNATIVE (
  rec_id          NUMBER       NOT NULL,
  street_id       NUMBER       NOT NULL,
  alt_street_name VARCHAR2(40) NOT NULL,
  create_date     DATE         NOT NULL,
  batch_id        NUMBER       NOT NULL,
--  
  CONSTRAINT rec_id_pk PRIMARY KEY (rec_id)
)
/
CREATE TABLE ADDR.BATCH_ID (
  rec_id          NUMBER        NOT NULL,
  table_name      VARCHAR2(255) NOT NULL,
  batch_id        NUMBER        NOT NULL,
  create_date     DATE          NOT NULL,
 --  
  CONSTRAINT batch_id_pk PRIMARY KEY (rec_id)
)
/
-- A statement to insert a record into the BATCH_ID table for
-- the next value to be used in a data load:
insert into ADDR.BATCH_ID values (1, 'STREET_ALTERNATIVE', 
  (select nvl(max(batch_id), 0) from ADDR.BATCH_ID
    where table_name = 'STREET_ALTERNATIVE') + 1, sysdate)
/