Unicode support

Yesterday I discovered a funny blogpost about unicode support in Documentum (have no idea why it is named “DOCUMENTUM PROBLEMS AND HOW TO FIX THEM: #1” if it does not contain any solution), and now I would like to share my vision on the problem.

It is not clear why that blogpost is referring to “CS-49851 – “Server does not recognize a UTF-8 enabled database and unnecessarily errors on attribute length””, because I have seen other related CRs dated by 2005 or so, however I can explain why OpenText will never implement a proper unicode support in Documentum.

At current moment Documentum supports four database engines:

  • MSSQL
  • Oracle
  • DB2
  • PostgreSQL

What do you think, which database is the most problematic from unicode perspective? To answer this question we must understand what does “varchar(n)” mean for every database:

database data type description
MSSQL varchar(n) Variable-length, non-Unicode string data. n defines the string length and can be a value from 1 through 8,000. max indicates that the maximum storage size is 2^31-1 bytes (2 GB). The storage size is the actual length of the data entered + 2 bytes. The ISO synonyms for varchar are charvarying or charactervarying.
nvarchar(n) Variable-length Unicode string data. n defines the string length and can be a value from 1 through 4,000. max indicates that the maximum storage size is 2^31-1 bytes (2 GB). The storage size, in bytes, is two times the actual length of data entered + 2 bytes. The ISO synonyms for nvarchar are national char varying and national character varying
Oracle varchar2(n) The VARCHAR2 datatype stores variable-length character strings. When you create a table with a VARCHAR2 column, you specify a maximum string length (in bytes or characters) between 1 and 4000 bytes for the VARCHAR2 column. For each row, Oracle Database stores each value in the column as a variable-length field unless a value exceeds the column’s maximum length, in which case Oracle Database returns an error. Using VARCHAR2 and VARCHAR saves on space used by the table.
DB2 varchar(n) Varying-length character strings with a maximum length of n bytes. n must be greater than 0 and less than a number that depends on the page size of the table space. The maximum length is 32704.
vargraphic(n) Varying-length graphic strings. The maximum length, n, must be greater than 0 and less than a number that depends on the page size of the table space. The maximum length is 16352.
PostgreSQL varchar(n) SQL defines two primary character types: character varying(n) and character(n), where n is a positive integer. Both of these types can store strings up to n characters (not bytes) in length. An attempt to store a longer string into a column of these types will result in an error, unless the excess characters are all spaces, in which case the string will be truncated to the maximum length. (This somewhat bizarre exception is required by the SQL standard.) If the string to be stored is shorter than the declared length, values of type character will be space-padded; values of type character varying will simply store the shorter string.

So, in order to implement proper unicode support Documentum must:

  • Do nothing for PostgreSQL
  • Change string semantics from byte to character in case of Oracle (i.e. alter table dm_ysobject_s modify (object_name varchar2(255 char)))
  • Change string datatype from varchar to vargraphic in case of DB2 (I believe something like ALTER TABLE DM_SYSOBJECT_S ALTER COLUMN OBJECT_NAME SET DATA TYPE VARGRAPHIC(255), though I’m not sure it will work)
  • Discontinue support of MSSQL because this database wrongly assumes that the maximum length of any UTF-8 character is 2 bytes (compare: é (C3 A9) and é (65 CC 81))

So, it is clear that it is not possible to implement proper unicode support in case of MSSQL, so OpenText will do nothing because otherwise Documentum will behave differently on different databases.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s