CMIS

About 6 moths ago I was complaining about CMIS that it considers all requests which contain the same credentials (i.e. login and password are the same for all requests) as requests from the same client, and for all requests containing the same credentials CMIS uses the only one repository session, unfortunately we didn’t implement solution proposed in that blogpost – it is hard to maintain different passwords across clients, meanwhile we have started receiving concurrency-related errors, so, it was required to undertake something and I have found a solution – it is enough to replace just two classes in CMIS:

JMS high availability feature. Part II

Why I did recall a feature, which I have never used before and will never use in the future? The explanation is following: In order to refresh my memory I was reading installation guide for Content Server 7.3 and noticed following statement:

Actually, documentation does not explain what does mean “methods requiring trusted authentication”, it seems that remote JMS supports workflow methods only, but from any perspective this statement sounds weird, the problem is on that moment I already discovered vulnerability in Content Server which allows attacker to download $DOCUMENTUM_SHARED/config/dfc.keystore file, this file is very interesting because it allows to connect to Content Server as superuser (note the value of server_trust_priv flag):

[dmadmin@docu72dev01 config]$ keytool -list -v -keystore dfc.keystore 
Enter keystore password:  

*****************  WARNING WARNING WARNING  *****************
* The integrity of the information stored in your keystore  *
* has NOT been verified!  In order to verify its integrity, *
* you must provide your keystore password.                  *
*****************  WARNING WARNING WARNING  *****************

Keystore type: JKS
Keystore provider: SUN

Your keystore contains 1 entry

Alias name: dfc
Creation date: May 5, 2015
Entry type: PrivateKeyEntry
Certificate chain length: 1
Certificate[1]:
Owner: CN=dfc_zOkF5qKyACcQUjLJD2bt1y3dXr0a, O=EMC, OU=Documentum
Issuer: CN=dfc_zOkF5qKyACcQUjLJD2bt1y3dXr0a, O=EMC, OU=Documentum
Serial number: 4d23be10ce8e183732c451091e0e3dbf
Valid from: Tue May 05 16:03:10 MSK 2015 until: Fri May 02 16:08:10 MSK 2025
Certificate fingerprints:
         MD5:  8B:BD:5C:F6:18:9D:27:9F:28:A7:69:A4:45:AD:32:63
         SHA1: 37:CC:14:C7:3E:BA:8F:AF:CE:E8:E5:4E:D2:F5:01:AF:3E:B6:1D:3F
         SHA256: 88:FA:7A:04:F8:47:AE:88:AC:EB:D5:BE:28:80:A6:7E:21:51:34:86:A5:96:0E:FF:11:61:90:E9:EA:AC:B4:0C
         Signature algorithm name: SHA1withRSA
         Version: 1


*******************************************
*******************************************


API> retrieve,c,dm_client_rights where client_id='dfc_zOkF5qKyACcQUjLJD2bt1y3dXr0a'
...
08024be980000587
API> dump,c,l
...
USER ATTRIBUTES

  object_name                     : dfc_docu72dev01_3dXr0a
  title                           :
  subject                         :
  authors                       []: <none>
  keywords                      []: <none>
  resolution_label                :
  owner_name                      : dmadmin
  owner_permit                    : 7
  group_name                      : docu
  group_permit                    : 1
  world_permit                    : 1
  log_entry                       :
  acl_domain                      : dmadmin
  acl_name                        : dm_45024be980000222
  language_code                   :
  client_id                       : dfc_zOkF5qKyACcQUjLJD2bt1y3dXr0a
  public_key_identifier           : 5F6CF69241D4745C01C943BAD1AFFB027398EF32
  host_name                       : docu72dev01
  allowed_roles                 []: <none>
  allow_all_roles                 : T
  allow_all_priv_modules          : F
  principal_auth_priv             : T
  server_trust_priv               : T
  app_name                        :
  is_globally_managed             : F

So, there is a kind of interesting situation: official software is unable to take advantage of trusted authentication, but attacker can ๐Ÿ™‚

But on the last week EMC published another interesting support note – JMS high availability feature does not work:

dfc.query.should_include_object_name

Have never thought that my colleagues may teach me something…

Yesterday I asked my colleague, who is trying to improve his skills in performance optimisation, whether he had any idea how to improve this SQL statement:

SELECT ALL dm_folder.r_object_id
  FROM dm_folder_sp dm_folder
 WHERE     (    EXISTS
                   (SELECT r_object_id
                      FROM dm_folder_r
                     WHERE     dm_folder.r_object_id = r_object_id
                           AND r_folder_path = :"SYS_B_00")
            AND (dm_folder.object_name = :"SYS_B_01"))
       AND (    dm_folder.i_has_folder = :"SYS_B_02"
            AND dm_folder.i_is_deleted = :"SYS_B_03")

and, surprisingly, the answer was: “Yes, I have seen something similar on support site – EMC suggest to set dfc.query.should_include_object_name and dfc.query.should_include_object_name properties, something like:

dfc.query.object_name_for_docbase[0]=<docbase_name>
dfc.query.should_include_object_name[0]=false


Well, as was expected both dfc.query.should_include_object_name and dfc.query.should_include_object_name properties are not documented, so let discuss the problem more thoroughly.

Imagine that we are maintaining following folder structure in our docbase:

\_CLIENT_1
  \_CLAIMS
  \_INVOCES
\_CLINET_2
  \_CLAIMS
  \_INVOCES
...
\_CLIENT_X
  \_CLAIMS
  \_INVOCES

i.e. for every client we create the same folder structure and when we want to store invoice for particular client we do something like:

create,c,dm_document
set,c,l,object_name
xxx
link,c,/CLIENTS/CLINET_1/INVOICES
save,c,l

the problem is that upon link call DFC calls IDfSession#getFolderByPath method to retrieve folder object with particular path, and inside IDfSession#getFolderByPath method DFC does following: it cuts off object name part from the path (i.e. everything after last ‘/’) and builds following DQL query:

SELECT r_object_id FROM dm_folder 
WHERE object_name='INVOICES' 
 AND ANY r_folder_path='/CLIENTS/CLINET_1/INVOICES'

such implementation is bit weird for two reasons:

  • when I do the same I just write something like “retrieve,c,dm_folder where any r_folder_path=”…” and do not bother myself about object name
  • Content Server has a build-in FolderIdFindByPath RPC command:
    API> apply,c,,FolderIdFindByPath,_FOLDER_PATH_,S,/dmadmin
    ...
    q0
    API> next,c,q0
    ...
    OK
    API> get,c,q0,result
    ...
    0c01d92080000105
    API> close,c,q0
    ...
    OK

    which generates following effective SQL statement:

    select r_object_id from dm_folder_r where r_folder_path = :p0

so, I have no idea why DFC performs extra logic here, moreover, in case of current DFC implementation we are getting overcomplicated SQL query and, sometimes database engine fails to build a good execution plan for this query (this is caused by dumb recommendation to set CURSOR_SHARING database parameter to FORCE and depending on docbase structure execution of such query may take minutes). Below are two possible execution plans for this query:

good (dm_folder_r is a leading table – querying dm_folder_r table by r_folder_path will always return not more than one row):

and bad (dm_folder_r is not a leading table – imagine that we have 1 million clients and hence 1 million INVOICE folders, so querying dm_sysobjec_s table by object_name first will return 1 million records):

in case of “retrieve,c,dm_folder where any r_folder_path=”…” execution plan is always good:

In 2011 (if my memory serves me right), I solved such performance problem by marking index on dm_folder_r(r_folder_path) as unique – in this case database engine always builds the correct execution plan because it knows that querying dm_folder_r table will always return not more than one row, however in recent versions DFC it is possible to disable it’s dumb behaviour by setting dfc.query.should_include_object_name and dfc.query.should_include_object_name properties – can’t understand why this wasn’t enabled by default.

OpenText rep promised further stagnation of Documentum

Pro Documentum

Check comments for Maybe OpenText will add value to Documentum after all:

Well,

Erik van Voorden: Nothing will change for documentum users. They will still get that same level of support as under EMC. Only the company name has changed.

how to treat such statements from a person, who have never worked before in ECM industry and doesnโ€™t have great employment history? I would say: sometimes, it is better to keep silent, and in case of Documentum and OpenText there are a plenty of reasons to do that โ€“ take a look how it is possible to turn single phrase into blogpost.

At first, EMC support was always poor (I would rate it as 2 out of 5), moreover, after 2013 it became inadequate (1 out of 5), claiming that you are going to keep support on the same level is a worst advertisement ever โ€“

View original post 179 more words

anti-performance series

It is been already a year since I had started nurturing an idea how to write a blogpost about performance best practices, unfortunately, such idea was initially doomed to failure – there are a lot of materials that need to be verified before posting, and no doubt it should take a lot of time, so I “invented” another format: I will try to prove or ruin statements from performance guides provided by talented team.

Actually, some performance-related statements were already ruined in previous posts:

Minimizing and consolidating activities
System throughput varies between 3-100 activities per second, depending on system configuration and hardware. Workflows with more activities take longer to complete. The largest performance impact for processing activities results from opening up a new Content Server session. As a result, the biggest performance improvement comes from minimizing the number of discrete activities in a workflow. Minimize the number of workflow activities by, 1) eliminating unnecessary activities altogether or 2) consolidating the steps performed by multiple activities, into a single condensed activity.
To improve the completion rate of individual activities, do the following:

  • Use the bpm_noop template wherever possible. This particular noop does not create an additional template and does not send an HTTP post to the JMS
  • Within the automatic activity, do the work on behalf of a superuser instead of a regular user
  • Turn off auditing whenever unnecessary

Iteratively modify the number of system workflow threads to assess the impact on user response time, activity throughput, and system resource consumption. More workflow threads result in greater automatic activity throughput up to the point where system resource consumption degrades performance. Scale up slowly to understand when resource limitations begin to show (Content Server CPU and database CPU utilization). The following provides some guidelines:

  • A single CPU Content Server host cannot process 10,000 activities per hour, regardless of how it is configured
  • Be cautious if CPU or memory utilization exceeds 80% for any tier in the system
  • Do not configure more than three threads per CPU core

If throughput requirements exceed the capacity that a single Content Server can provide, add more Content Servers. Each Content Server instance (and associated workflow agent) nominally supports 15 concurrent workflow threads. Deploy one Content Server instance for every multiple of 15 concurrent workflow threads required by your solution. Avoid more than 25 workflow threads for any Content Server.

In general the statements above are misleading:

  • I doubt that “The largest performance impact for processing activities results from opening up a new Content Server session”: at first, JMS do not open new sessions – all sessions are already in session pool, bad thing here is DFC performs authentication when it acquires session from session pool – CS generates new ticket for every auto-activity and these tickets never match passwords associated with pooled sessions and, if my memory severs me right, such reauthentication takes 2 RPCs, at second, dealing with workitem typically takes 4 RPCs: begin transaction, acquire, complete, commit (+ content server does some extra job: creating next activity, updating workflow object, etc) + we need to do some useful work (i.e. perform business logic)
  • workflow delays, caused by processing of auto-activities, does not affect business users: business users are not robots, they do not complete tasks as quick as thought โ€“ a couple of extra minutes wonโ€™t make sense. On the other hand “consolidating” auto-activities has a negative impact on a project complexity: you need to either consolidate both code and docbase methods or create an extra layer, purposed to implement such consolidation (actually, we use the second option, but that wasn’t influenced by performance considerations), so, it is much better to keep code simple in spite of EMC’s idea about consolidations sounds reasonable
  • I have no idea what were prerequisites to suggest invoking auto-activities under superuser account (I would accept the following scenario: all auto-activities are invoked under installation owner account and CS/JMS takes advantage of trusted authentication, but workflow agent does not support such option), but my preferred option is to assign “previous activity performer” as performer of auto-activity and take advantage of dynamic groups – such approach allows to keep track of last performer of manual activities – business users are able to see who have sent them a task
  • “10,000 auto-activities per hour for single CPU host” is extremely pessimistic estimation – 30,000-50,000 is more close to reality on modern hardware
  • there is no scientific explanation why we need to limit the number of workflow agents by 25 (extra licence fees?) – I do believe that “2 * number of cores” is a good starting point for any hardware configuration

JMS performance slowdown

About two months ago I had noticed a weird behaviour of JMS: after a while (the shortest period I had noticed was 2 days) the code related to “mathematical computations” starts executing extremely slow – if previously execution of certain block of code normally took about 10ms, after a while it takes about 5 minutes, such behaviour also accompanies by the following observation: JVM does not utilise all available cores, i.e. I see multiple (>10) “computation” threads executing the same block of code, but JVM does not consume more than 2 CPU cores.

I tried to eliminate all possible root causes (entropy, GC, RSA libraries), but nothing helped, today I have discovered following topics which look extremely similar to my performance issue:

And indeed, there is a difference between jboss 7.1 and 7.2 in standalone.sh:

@@ -102,11 +146,6 @@
         if [ "x$NO_COMPRESSED_OOPS" = "x" ]; then
             "$JAVA" $JVM_OPTVERSION -server -XX:+UseCompressedOops -version >/dev/null 2>&1 && PREPEND_JAVA_OPTS="$PREPEND_JAVA_OPTS -XX:+UseCompressedOops"
         fi
-
-        NO_TIERED_COMPILATION=`echo $JAVA_OPTS | $GREP "\-XX:\-TieredCompilation"`
-        if [ "x$NO_TIERED_COMPILATION" = "x" ]; then
-            "$JAVA" $JVM_OPTVERSION -server -XX:+TieredCompilation -version >/dev/null 2>&1 && PREPEND_JAVA_OPTS="$PREPEND_JAVA_OPTS -XX:+TieredCompilation"
-        fi
     fi
 
     JAVA_OPTS="$PREPEND_JAVA_OPTS $JAVA_OPTS"

I have tried to disable TieredCompilation option and now I continue to monitor JMS behaviour ๐Ÿ™‚