Unicode support

Yesterday I discovered a funny blogpost about unicode support in Documentum (have no idea why it is named “DOCUMENTUM PROBLEMS AND HOW TO FIX THEM: #1” if it does not contain any solution), and now I would like to share my vision on the problem.

It is not clear why that blogpost is referring to “CS-49851 – “Server does not recognize a UTF-8 enabled database and unnecessarily errors on attribute length””, because I have seen other related CRs dated by 2005 or so, however I can explain why OpenText will never implement a proper unicode support in Documentum.

At current moment Documentum supports four database engines:

  • Oracle
  • DB2
  • PostgreSQL

What do you think, which database is the most problematic from unicode perspective? To answer this question we must understand what does “varchar(n)” mean for every database:

database data type description
MSSQL varchar(n) Variable-length, non-Unicode string data. n defines the string length and can be a value from 1 through 8,000. max indicates that the maximum storage size is 2^31-1 bytes (2 GB). The storage size is the actual length of the data entered + 2 bytes. The ISO synonyms for varchar are charvarying or charactervarying.
nvarchar(n) Variable-length Unicode string data. n defines the string length and can be a value from 1 through 4,000. max indicates that the maximum storage size is 2^31-1 bytes (2 GB). The storage size, in bytes, is two times the actual length of data entered + 2 bytes. The ISO synonyms for nvarchar are national char varying and national character varying
Oracle varchar2(n) The VARCHAR2 datatype stores variable-length character strings. When you create a table with a VARCHAR2 column, you specify a maximum string length (in bytes or characters) between 1 and 4000 bytes for the VARCHAR2 column. For each row, Oracle Database stores each value in the column as a variable-length field unless a value exceeds the column’s maximum length, in which case Oracle Database returns an error. Using VARCHAR2 and VARCHAR saves on space used by the table.
DB2 varchar(n) Varying-length character strings with a maximum length of n bytes. n must be greater than 0 and less than a number that depends on the page size of the table space. The maximum length is 32704.
vargraphic(n) Varying-length graphic strings. The maximum length, n, must be greater than 0 and less than a number that depends on the page size of the table space. The maximum length is 16352.
PostgreSQL varchar(n) SQL defines two primary character types: character varying(n) and character(n), where n is a positive integer. Both of these types can store strings up to n characters (not bytes) in length. An attempt to store a longer string into a column of these types will result in an error, unless the excess characters are all spaces, in which case the string will be truncated to the maximum length. (This somewhat bizarre exception is required by the SQL standard.) If the string to be stored is shorter than the declared length, values of type character will be space-padded; values of type character varying will simply store the shorter string.

So, in order to implement proper unicode support Documentum must:

  • Do nothing for PostgreSQL
  • Change string semantics from byte to character in case of Oracle (i.e. alter table dm_ysobject_s modify (object_name varchar2(255 char)))
  • Change string datatype from varchar to vargraphic in case of DB2 (I believe something like ALTER TABLE DM_SYSOBJECT_S ALTER COLUMN OBJECT_NAME SET DATA TYPE VARGRAPHIC(255), though I’m not sure it will work)
  • Discontinue support of MSSQL because this database wrongly assumes that the maximum length of any UTF-8 character is 2 bytes (compare: é (C3 A9) and é (65 CC 81))

So, it is clear that it is not possible to implement proper unicode support in case of MSSQL, so OpenText will do nothing because otherwise Documentum will behave differently on different databases.

Are changes coming?

Pro Documentum

On last week something weird had happened – a couple of researches disclosed information about vulnerabilities in Documentum xPression and Documentum WDK applications:

and these disclosures are qualitatively different from what EMC was publishing previosely – these disclosures had been coordinated. Let’s explain this point. When Documentum was under EMC wing EMC was never published correct/true information about security flaws: they were always underestimating security impact and were never noticing that exploit/PoC were available in the wild, and such behaviour, obviously, had negative impact on customers: customers see that vulnerability impact is medium and prefer do not install security fixes – that is a kind of…

View original post 651 more words

Documentum innovations

Pro Documentum

A week ago OpenText have announced another one Documentum roadmap (here I recalled following quote:

“I’ve been watching this process with OpenText for more than a decade, and I think, in 2009, I called them ‘The Roadmap Company,'” said Tony Byrne, founder of Real Story Group, a research and advisory firm in Olney, Md. “Every time [OpenText] acquires a company, they always have this story around innovation and synergy and a roadmap. It’s a very nice story for the customer and perhaps OpenText believes it, but it very rarely executes on it.”

), this roadmap contains more “technical details” than the previous one, so we may discuss it more thoroughly.

Brava! & Blazon

Actually, here I didn’t understand what innovations they were talking about, because both innovations are already available:

View original post 259 more words

Why you should stay clear of REST. Part II

Pro Documentum

Actually, as a continuation of previous blogpost I wanted to write about DDD and my own experience with DFS, CMIS and REST, but today I have found another gem on LinkedIn: Performance Anti-Pattern For RESTful API – batch updating (saved copy), actually that blogpost hides package names:

but all we do know what is hidden there:

As far as I know, batch updates were introduced in 7.2 and they are dead slow 🙂

Another LinkedIn gem: Potential Permanent Generation Leakage In Your JVM

View original post


Technology Services Group recently published a blogpost which compares PDF.js and OpenAnnotate, unfortunately, all their comparison is based on hypothesis that PDF.js does not support progressive loading:

PDF.js loads the entire PDF into the client via JavaScript. This works fine for moderately large documents (10 pages), however many of our clients have documents in the 300-700 page range. Larger files put a lot of strain on the network, and leaves minimal options when it comes to performance tuning.

Which is actually not true, because PDF.js does support progressive loading since 2013: Implement progressive loading of PDFs, actually, it is more correct to say that PDF.js does support progressive loading since birth, because PDF.js was originally created as a Firefox extension and was included in Mozilla Firefox since 2012 (version 15), and it was enabled by default since 2013 (version 19). Unfortunately, after receiving a couple of valuable comments Technology Services Group embarrassingly decided to remove those comments and close blogpost for further comments:

Those valuable comments were:

  • PDF.js does support PDF annotating capabilities via plugin
  • If TSG thinks that progressive loading does require linearized PDFs, why they do not optimize all PDFs before storing

The most interesting thing here is a fact, that progressive loading does not require linearized PDFs:

Why you should stay clear of REST

Pro Documentum

Once again I get inspired by to write this blogpost by Alvaro de Andres Documentum 16.3 delayed until Feb 2018 blogpost – there is some “interesting discussion” about REST where you can find following opinion from another member of talented team:

Is DFS a personal target for your projects? Baseline REST is a mature offering and it has active Engineering working on it. The planet seems to be focused on a RESTful interface and the DCTM platform is meeting that focus with a set of platform APIs. As I’m sure you are aware Captiva, xCP and D2 can be access through REST interfaces.

Actually, it is not clear what Tomas meant under the word “REST” because my observations about REST are following:

  • when most people talk about REST they typically mean JSON, which is obviously wrong: it is true that in the most cases (especially when REST client is…

View original post 639 more words