/*
Beginning with 12c, csscan is replaced with DMU (Database Migration Assistant for Unicode), a GUI tool located at
$ORACLE_HOME/dmu. See ./DmuCharacterSetTool.txt on this website.
*/


			Use csscan to Check for Possible Characterset Conversion Errors


* Routine basics: How?

Before migrating schemas across DBs of different characterset, run csscan.

Do once in the source database to prepare csmig:
sqlplus / as sysdba
@?/rdbms/admin/csminst.sql

(Directory objects log_file_dir, data_file_dir are useless. Ignore the error. See Note 745809.1)

Scan the schema appuser specifying target database characterset:
csscan dbauser tochar=al32utf8 user=appuser
or
csscan \"sys as sysdba\" tochar=al32utf8 user=appuser (syntax on Windows only)

On Linux, set environment variable before running csscan:
export LD_LIBRARY_PATH=$ORACLE_HOME/lib
csscan dbauser tochar=al32utf8 user=appuser
or to scan the database:
csscan dbauser tochar=al32utf8 full=y

If you get heap memory error on Linux, e.g.

. process 1 scanning APPUSR1.TAB1[AABhFyAA4AAAAGxAAA]
. process 1 scanning APPUSR2.TAB2[AACCTZAA+AAAAjxAAA]*** glibc detected *** csscan: corrupted double-linked list: 0x0000000009d627c0 ***
======= Backtrace: =========
/lib64/libc.so.6[0x394e472150]
/lib64/libc.so.6(__libc_calloc+0xcd)[0x394e47382d]
/u01/app/oracle/product/10.2.0/db/lib/libclntsh.so.10.1(slwmmgetmem+0x28)[0x2adc7cdaa4f8]

disable the heap consistency check (per http://www.gnu.org/savannah-checkouts/gnu/libc/manual/html_node/Heap-Consistency-Checking.html):
MALLOC_CHECK_=0 csscan dbauser full=y tochar=al32utf8


Check result in scan.txt. The best is
All character type application data are convertible to the new character set
If you see
Some character type application data are not convertible to the new character set
then it's bad.
If you see "lossy conversion" in scan.err, it's even worse.

After each scan, it's better to rename the scan.txt to a different name to avoid being overwritten.

Ref:
* "Scan Summary"
  http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/ch12scanner.htm#sthref1651
* "How to Handle Convertible or Lossy Data in the Data Dictionary"
  http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/ch12scanner.htm#i1013515
* Metalink 260192.1 "Changing WE8ISO8859P1/ WE8ISO8859P15 or WE8MSWIN1252 to (AL32)UTF8" 



* Generate DDL's to widen columns

scan.err has lines like the following:

User  : SMARTSTREAM
Table : GL_DETAIL_FY11
Column: DESC_3
Type  : VARCHAR2(10)
Number of Exceptions         : 75
Max Post Conversion Data Size: 17

If you run impdp without correcting the problem, you'll get errors such as

ORA-02374: conversion error loading table "SMARTSTREAM"."GL_DETAIL_FY11"
ORA-12899: value too large for column DESC_3 (actual: 17, maximum: 10)
ORA-02372: data for row: DESC_3 : 0X'51494E47204D41204D414947BFBFBFBFBF'

The solution is to create GL_DETAIL_FY11 in the target database without importing rows. Run the DDL below to widen the column:

alter table SMARTSTREAM.GL_DETAIL_FY11 modify DESC_3 VARCHAR2(17);

Then you import the data specifying table_exists_action=truncate, append, or replace for impdp (ignore=y for imp).

Manually constructing the DDL's is tedious and error-prone if you have too many. Use this script to automate:

$ cat csscan2ddl.pl
#!/usr/bin/perl -w
#csscan2ddl.pl: scan.err created by csscan contains information that can 
#be used to construct DDL's to increase column width in the target 
#database. Script csscan2ddl.pl automates the process.
#2012-02, Yong Huang (yong321@yahoo.com)
#Usage: csscan2ddl.pl scan.err

while (<>) {
if (/^User  : (.*$)/)
 { print "alter table $1." }
elsif (/^Table : (.*$)/)
 { print "$1 modify " }
elsif (/^Column: (.*)/)
 { print "$1 " }
elsif (/^Type  : (\w+)/)
 { print "$1(" }
elsif (/^Max Post Conversion Data Size: (\d+)/)
 { print "$1);\n" }
}
$ ./csscan2ddl.pl scan.err
alter table SMARTSTREAM.GL_DETAIL_FY11 modify DESC_3 VARCHAR2(17);
alter table SMARTSTREAM.GL_DETAIL_FY10 modify DESC_2 VARCHAR2(17);
alter table SMARTSTREAM.GL_COBASECENTER_DESC modify CO_BASE_CENTER_DESCRIPTION CHAR(41);
...

You may redirect the screen output to a file for better reading, preferably piped through sort first.

The easiest way to migrate the data is to run the above DDL's in the source database so the columns are widened before 
export/import, which will work transparently later. 

If the above DDL is like "... VARCHAR2(<a number greater than 4000>)", then the only solution is to change it to CLOB, or 
temporarily modify the data to make it shorter and update it back once the import to the target database is completed.

Sometimes scan.err reports a column section where "Max Post Conversion Data Size" is *smaller* than the column width 
defined in the database. You can ignore those columns. My Perl script currently still includes them.