Database Migration Assistant for Unicode

(Note: If the database is pre-12c, use csscan instead. See ./csscan.txt on this website.)

$ sqlplus sys@testdb as sysdba   <-- my database is a PDB called testdb, using WE8MSWIN1252 character set
SQL> @?/rdbms/admin/prvtdumi.plb <-- must be run in the PDB that needs character set conversion, not CDB

The example here runs DMU on the Oracle server, which is Linux.

Run X Windows server on <IP>, e.g. your laptop. Then on the Oracle server
$ export DISPLAY=<IP>:0

The dmu.sh script has no executable bit by default. Add that and launch it.
$ chmod u+x /u01/app/oracle/product/19.0.0/db/dmu/dmu.sh #or chmod 755
$ /u01/app/oracle/product/19.0.0/db/dmu/dmu.sh

1. Create a connection to the PDB database; user sys is preferred.

2. Create migration repository: Migration -> Configure DMU Repository. After selecting the target character set, it will take a long time because DMU is scanning all DB files.

3. Scan Database. Choose Application Schemas and among them, uncheck internal schemas or any one not needed. Reduce parallel degree; the default 16 may be too high if the server doesn't have many CPUs.

4. Create report: Migration -> Problem Data Report (or Database Scan Report). Choose Application Schemas. The report is created at /home/oracle/.dmu/log by default, as an Excel file, e.g. ProblemDataReport_testdb_20210723093821.xls.

If there are no unconvertible data, and there are data that will exceed defined column widths in the target database which you'll import data into, we can either pre-create the tables in the target database with columns widened, or widen the columns in the current (source) database testdb.[note1] The following is an easy way to generate the SQLs or DDLs that widen the columns.

Open the report with Excel, go to sheet "Over Column Limit (Scheduled)". To generate SQLs to widen the columns in the target database, create a copy of this worksheet and delete these 4 columns
	Rowid, Current Data Length, Cell Data, Hexadecimal Value
and leave only these 5 columns
	Schema Name, Table Name, Column Name, Data Type, Post-Conversion Data Length
Copy this new worksheet data (press ^A, ^C), and paste (^V) into a vi window. Remove the first header line and the last empty line (and the lines whose table names begin with "BIN$" if any). Save to a file named, say, dmurpt.txt.

Create the Perl script dmu2ddl.pl
--- BEGIN CUT ---
#!/usr/bin/perl
#dmu2ddl.pl: DMU (Database Migration Assistant for Unicode) report
#contains information that can be used to construct DDL's to
#increase column widths in the target database before importing
#data from a non-Unicode database. Script dmu2ddl.pl automates the
#process. The input to this script is the sorted output of the
#DMU report, named dmurpt_sorted.txt, 5 columns on each line:
#schema, table, column, datatype, expected width
#separated by tabs
#2021, Yong Huang (yong321@yahoo.com)
#Usage: ./dmu2ddl.pl

open COL, "< dmurpt_sorted.txt" or die "Can't open dmurpt_sorted.txt: $!";
while (<COL>)
{ @col=split /\t/;
  if (!defined $c0) #first line, previous line's value not defined yet, so just assign values
  { $c0=$col[0]; $c1=$col[1]; $c2=$col[2]; $c3=$col[3]; $colmaxw=$c4=$col[4];
    #the 5 columns are: schema, table, column, datatype, expected width, e.g.
    #FM     L_FM_FACT     DESC_2     VARCHAR2(12)     17
  }
  else
  { if ($c0 eq $col[0] && $c1 eq $col[1] && $c2 eq $col[2] && $c3 eq $col[3]) #$c0,$c1,$c2,$c3 are last line's values
    { $colmaxw=$c4>$col[4]?$c4:$col[4]; #greater of the column widths across two lines
    }
    else
    { #Now we come to a line for a different column. Let's print the DDL for the previous column,
      #with the max column width we finally determined.
      #But the datatype in DMU's output has the currently defined width e.g. "VARCHAR2(12)". Need to remove "(12)" here
      $c3=~s/\(.*//; #$c3 is the DMU's datatype, need to remove "(<number>)"
      print "alter table $c0.$c1 modify $c2 $c3($colmaxw);\n";
    }
    $c0=$col[0]; $c1=$col[1]; $c2=$col[2]; $c3=$col[3]; $colmaxw=$c4=$col[4];
  }
}
#after reading all lines of dmurpt_sorted.txt, we need to output the the DDL for the last column
$c3=~s/\(.*//; #remove "(<number>)"
print "alter table $c0.$c1 modify $c2 $c3($colmaxw);\n";
close COL;
--- END CUT ---
and chmod 755 dmu2ddl.pl. Then

$ sort dmurpt.txt > dmurpt_sorted.txt
$ ./dmu2ddl.pl
alter table FM.L_FM_FACT modify DESC_2 VARCHAR2(24);
alter table FM.L_FM_FACT modify DESC_3 VARCHAR2(20);
alter table FM.L_FM_FACT_MVW modify DESC_2 VARCHAR2(17);
...

Then you pre-create those tables in the target database with those columns widened to the widths shown above.[note2] Or you can even widen the columns in the current (source) database, export data, and then restore i.e. shrink the columns back. But read [note1] below if you choose to modify the columns in the source database.

----------------------------------------------------------------------------------------------------

Possible bug in DMU:

If the data type is CHAR, "Post-Conversion Data Length" in the DMU report may fall short. For example, my table 
GLDTL.GL_COBASECENTER_DESC has column CO_BASE_CENTER_DESCRIPTION CHAR(39), and DMU's Post-Conversion Data Length for it is 41 (there's only one line in the spreadsheet). So I ran on the pre-created empty table
alter table GLDTL.GL_COBASECENTER_DESC modify CO_BASE_CENTER_DESCRIPTION CHAR(41);
in the AL32UTF8 characterset target database. But importing this table into the target database threw error:

ORA-02374: conversion error loading table "GLDTL"."GL_COBASECENTER_DESC"
ORA-12899: value too large for column CO_BASE_CENTER_DESCRIPTION (actual: 42, maximum: 41)
ORA-02372: data for row: CO_BASE_CENTER_DESCRIPTION : 0X'42592054484520544F424143434F2043415243494E4F47454E'

It means we have to widen the column to CHAR(42) instead. Since this is CHAR data type, you don't want to widen the column beyond your need, unlike VARCHAR2 which allocates space only as much as the actual data occupies. This is one major reason CHAR is bad; in fact CHAR should always be replaced by VARCHAR2, but that's a different story.

----------------------------------------------------------------------------------------------------

[note1] Widening columns in the source database has a small risk. After you alter the live tables to wider columns but before you export the data, users may insert rows or update existing rows that exceed the originally designed column widths. This can be prevented if the widths are limited by the app which all users must go through to connect to the database and the app checks the length of the user input, or no user will insert or update data during this period.
Another concern is that, if the tables have many rows and you want to restore the column widths after the export from the source database, the "alter table" command to shrink the widths will lock the tables for quite some time or wait for the lock if the tables already have an incompatible lock on it.
Trick: If you need to modify more than one column of a table, run the "alter table" DDL command only once, e.g. "alter table FM.L_FM_FACT modify DESC_2 VARCHAR2(12) modify DESC_3 VARCHAR2(10)". Although the output of my dmu2ddl.pl shows two "alter table" commands for programming and viewing convenience, widening VARCHAR2 (but not CHAR) columns only modifies data dictionary without scanning and modifying the table, and completes in subsecond, so it's not a problem. Each "alter table" still locks the table, though.

[note2] If the tables have referential constraints between them, the constraints should be disabled in the target database before you import data into pre-created empty tables. Otherwise, you would get ORA-31693 (Table data object failed to load/unload and is being skipped) and ORA-02291 (integrity constraint violated).