Sacrificing performance

It is ridiculous, but sometimes it is required to sacrifice performance in order to get stability, let’s elaborate.

About year ago we decided to stop assigning multiple performers for single workflow activity, i.e. when a couple of users are supposed to do the same task in parallel (approval workflow is a good example of such activities) – instead of assigning multiple performers for single workflow activity now we spawn individual workflow for each performer:

and manage child workflows using following pattern:

such approach has a couple of advantages over default BPM capabilities and the most important is: we get some flexibility in workflow routing – standard Documentum options are: “wait until every performer makes a decision” or “complete activity after first reject”, now we are able to make routing decision relying on various factors like the position of performer in organisational chart, percent of performers who made negative decision, etc. So, if everything is perfect what is the stability problem I’m talking about? This short workflow:

is a time bomb. The problem is following: when Content Server starts workflow (or completes workflow task) it automatically creates the first (next) workflow task. What does happen if workflow task being created is a manual task? Performer receives e-mail notification, i.e. Content Server notifies performer by e-mail. Now, let’s check how Content Server sends e-mail notifications:

API> retrieve,c,dm_method where object_name='dm_event_sender'
API> get,c,l,launch_async

So, launch_async is set to true, and this means Content Server doesn’t wait when method gets completed, i.e. if your code looks like:

for (int i=0; i<zillion; i++) {
  launch new workflow

or like:

for (int i=0; i<zillion; i++) {
  queue notification

you are risking to down either JMS or operating system where Content Server resides – the result depends on how you send e-mail notifications. The general idea to solve such problem is to always add automatic activity before manual activity in workflow and avoid using com.documentum.fc.client.IDfSysObject#queue method.

I know a little bit about SQL performance!

Yesterday I was reading “document” called “EMC® Documentum® Platform and Platform Extensions Version 7.3 Installation Guide” and have noticed a couple of weird statements/recommendations, these statements are:

Actually, I have “skipped” really insane recommendations like “Change the value of Servername in ODBC.INI to localhost” (p. 94) or “put plaintext passwords in odbc.ini” (p. 93), but the last one related to data clustering have attracted my attention. I was already familiar with such concept in Oracle (Index-Organized Tables, Table Clusters), but did know nothing neither about MSSQL nor PostgreSQL (nevertheless the idea to align data according to primary key sounds stupid despite the database vendor). The interesting thing here is the fact that in case of MSSQL official documentation is wrong:

Clustered indexes sort and store the data rows in the table or view based on their key values. These are the columns included in the index definition. There can be only one clustered index per table, because the data rows themselves can be sorted in only one order

The good news is I have found a cool blog which sheds a light on various aspects of database implementations. This blog has a 3-minute quiz and I have hit 100%:

MS Word templates

I’m continuing generating texts and advertising really useful java projects (previous advertisements: ICU, ImageIO-Ext), the new challenge was to generate MS Office Word documentspopulate MS Office Word templates by repository data. Actually, generation and template processing are two different tasks – in case of the first one I would prefer either docx4j or XMLmind XSL-FO, but in case of template processing there are no much options available, especially, if I introduce two extremely basic requirements:

  • support of loops/iterators/tables
  • support of decisions/conditions

Initially, we had an idea to use Aspose, but it completely lacks support of decisions/conditions, moreover it’s support of loops/iterators/tables is extremely poor (besides the fact that the code is obfuscated), and after that we discovered xdocreport which perfectly fits our needs: