WTF???

I like large deployments because large deployments mean a lot of challenges ๐Ÿ™‚ And the new challenge is: ACLs are fucking slow because EMC fails to write fast code. Let’s check how long does it take to create ACL with 60000 accessors on DFC side (I do know that it is not a good idea to create ACLs with more than 32767 accessors, but I like drawing cool charts):

public class DfAclTest extends DfcTestSupport {

    @Test
    public void aclTest() throws Exception {
        IDfSession session = getSession();
        int multiplier = 1000;
        IDfACL acl = (IDfACL) session.newObject("dm_acl");
        System.out.println("\n=========== CREATE ===========");
        long start = System.currentTimeMillis();
        for (int i = 1; i <= 60; i++) {
            grant(acl, (i - 1) * multiplier, multiplier);
            System.out.println("accessors: " + (i * multiplier) + ", time: "
                    + (System.currentTimeMillis() - start));

        }
        System.out.println("\n=========== UPDATE ===========");
        start = System.currentTimeMillis();
        for (int i = 1; i <= 60; i++) {
            grant(acl, (i - 1) * multiplier, multiplier);
            System.out.println("accessors: " + (i * multiplier) + ", time: "
                    + (System.currentTimeMillis() - start));

        }
    }

    private static void grant(IDfACL acl, int offset, int amount)
        throws DfException {
        for (int i = offset; i < amount + offset; i++) {
            acl.grant(String.valueOf(i), 3, null);
        }
    }

}

and results are really impressive:

=========== CREATE ===========
...
accessors: 60000, time: 1155926
...
=========== UPDATE ===========
...
accessors: 60000, time: 3564534

i.e. if we were need to create ACL with 60000 records it would take 20 minutes, if we decided to update each record in that ACL it would take one hour ๐Ÿ™‚ I was unable to understand how it was possible to write so slow algorithm and decided to somehow optimize it, fortunately it didn’t take much time – I just changed two methods in com.documentum.fc.client.DfACL:

  1. findEntry – from
    for (int i = 0, limit = getAccessorCount(); i < limit; i++) {
        if (getAccessorPermitType(i) == permitType 
                && getAccessorName(i).equals(accessorName)) {

    to

            int start = findString(ACCESSOR_NAME, accessorName);
            if (start < 0) {
                return  -1;
            }
    
            for (int i = start, limit = getAccessorCount(); i < limit; i++) {
                if (getAccessorName(i).equals(accessorName)
                        && getAccessorPermitType(i) == permitType) {
    
  2. updateAuditTrail to
    return

and got another impressive results.

First optimization only:

=========== CREATE ===========
...
accessors: 60000, time: 456957
...
=========== UPDATE ===========
...
accessors: 60000, time: 537209

First and second optimizations:

=========== CREATE ===========
...
accessors: 60000, time: 136213
...
=========== UPDATE ===========
...
accessors: 60000, time: 131164

Chart:

aspects or not aspects…

Some time ago I wrote a post about no progress in Documentum platform, and the opinion about aspects feature was, I would say, neutral rather than negative, now, after two years of using aspects I can say without doubts: never ever use Documentum aspects, avoid repeating my mistake – EMC’s implementations is completely unreliable and slow. Let’s explain…

At first, I do not want to discuss DFC Aspect Model:

because it hard to imagine situation when you really need to build TBO classes in runtime – EMC tried use this feature in XCP2 and had failed. So, this feature is defective by design, and I would like to talk about additional attributes which we can introduce using aspects.

Well, how does Content Server behave when it fetches aspect-enabled objects? Let’s discover it’s behaviour using previously demonstrated technique (I specially issue two fetches because the first one performs a lot of irrelevant caching queries related to information about types and TBOs):

Connected to Documentum Server running Release 7.2.0030.0195  Linux64.Oracle
Session id is s0
API> apply,c,,SQL_TRACE,SESSION_ID,S,01024be980005db7,LEVEL,I,10
...
q0
API> ?,c,q0
result      
------------
T           
(1 row affected)

API> fetch,c,09024be98001f94e
...
OK
API> fetch,c,09024be98001f94f
...
OK
API> Bye

and the result is:

-- determine object type by r_object_id
SELECT dm_dbalias_B.I_TYPE
  FROM DMI_OBJECT_TYPE dm_dbalias_B
 WHERE dm_dbalias_B.R_OBJECT_ID = :dmb_handle;

-- retrieve object's data from database
  SELECT *
    FROM CUSTOM_TYPE_RV dm_dbalias_B, CUSTOM_TYPE_SV dm_dbalias_C
   WHERE (    dm_dbalias_C.R_OBJECT_ID = :dmb_handle
          AND dm_dbalias_C.R_OBJECT_ID = dm_dbalias_B.R_OBJECT_ID)
ORDER BY dm_dbalias_B.R_OBJECT_ID, dm_dbalias_B.I_POSITION;

-- retrieve aspect's data from database
  SELECT *
    FROM DMI_03024BE98000025B_RV dm_dbalias_B,
         DMI_03024BE98000025B_SV dm_dbalias_C
   WHERE (    dm_dbalias_C.R_OBJECT_ID = :dmb_handle
          AND dm_dbalias_C.R_OBJECT_ID = dm_dbalias_B.R_OBJECT_ID)
ORDER BY dm_dbalias_B.R_OBJECT_ID, dm_dbalias_B.I_POSITION;

-- check whether object is actual or not
SELECT dm_dbalias_B.R_OBJECT_ID
  FROM DM_SYSOBJECT_S dm_dbalias_B
 WHERE (    dm_dbalias_B.R_OBJECT_ID = :dmb_objectp
        AND dm_dbalias_B.I_VSTAMP = :dmb_versionp);

If you compare these queries with queries described in What happens when smart guy does not sit back blogpost you will find that the existence of aspect introduces two new queries, i.e.:

-- retrieve aspect's data from database
  SELECT *
    FROM DMI_03024BE98000025B_RV dm_dbalias_B,
         DMI_03024BE98000025B_SV dm_dbalias_C
   WHERE (    dm_dbalias_C.R_OBJECT_ID = :dmb_handle
          AND dm_dbalias_C.R_OBJECT_ID = dm_dbalias_B.R_OBJECT_ID)
ORDER BY dm_dbalias_B.R_OBJECT_ID, dm_dbalias_B.I_POSITION;

-- check whether object is actual or not
SELECT dm_dbalias_B.R_OBJECT_ID
  FROM DM_SYSOBJECT_S dm_dbalias_B
 WHERE (    dm_dbalias_B.R_OBJECT_ID = :dmb_objectp
        AND dm_dbalias_B.I_VSTAMP = :dmb_versionp);

So, it is obvious that aspects have negative performance impact, what about reliability? Try to guess what is the purpose of the last select statement:

-- check whether object is actual or not
SELECT dm_dbalias_B.R_OBJECT_ID
  FROM DM_SYSOBJECT_S dm_dbalias_B
 WHERE (    dm_dbalias_B.R_OBJECT_ID = :dmb_objectp
        AND dm_dbalias_B.I_VSTAMP = :dmb_versionp);

Though, there is much simpler question – why, when fetching object’s data from database, Content Server issues following statement:

  SELECT *
    FROM CUSTOM_TYPE_RV dm_dbalias_B, CUSTOM_TYPE_SV dm_dbalias_C
   WHERE (    dm_dbalias_C.R_OBJECT_ID = :dmb_handle
          AND dm_dbalias_C.R_OBJECT_ID = dm_dbalias_B.R_OBJECT_ID)
ORDER BY dm_dbalias_B.R_OBJECT_ID, dm_dbalias_B.I_POSITION;

but not two much more simple statements:

  SELECT *
    FROM CUSTOM_TYPE_SV dm_dbalias_C
   WHERE (    dm_dbalias_C.R_OBJECT_ID = :dmb_handle);

  SELECT *
    FROM CUSTOM_TYPE_RV dm_dbalias_B
   WHERE (    dm_dbalias_B.R_OBJECT_ID = :dmb_handle)
ORDER BY dm_dbalias_B.R_OBJECT_ID, dm_dbalias_B.I_POSITION;

?

The clue for the last question is a statement-level read consistency, i.e. in case of single query the result is always consistent, in case of two queries – doesn’t. Now the purpose of the last select statement in aspect example is clear: Content Server successively retrieves object’s data, aspect’s data, and after that Content Server checks whether the object have been changed since the first query or not. And now the most interesting thing: what happens if last select statement returns nothing? Nothing good. It is really hard to reproduce the situation but I have succeeded and wrote following test case:

/**
 * @author Andrey B. Panfilov <andrey@panfilov.tel>
 */
public class DfExceptionsTest extends DfcTestSupport {

    @Test
    @Ignore
    public void testSoftFetch() throws Exception {
        final IDfSession s1 = Sessions.brandNew(getSession()
                .getSessionManager(), getSession().getDocbaseName());
        final IDfSession s2 = Sessions.brandNew(getSession()
                .getSessionManager(), getSession().getDocbaseName());

        TestFetch testFetch = new TestFetch(s2, "09024be98001f94e");
        Thread t = new Thread(testFetch);
        t.start();

        ApplyExecSQL cmd = (ApplyExecSQL) DfAdminCommand
                .getCommand(IDfAdminCommand.APPLY_EXEC_SQL);
        cmd.setQuery("update dm_sysobject_s set i_vstamp=i_vstamp+1 "
                + "where r_object_id='09024be98001f94e'");

        for (int i = 0; i < 10000; i++) {
            cmd.execute(s1);
        }

        t.interrupt();
        assertFalse(testFetch._failed);
    }

    class TestFetch implements Runnable {

        private final IDfSession _session;

        private final String _objectId;

        private boolean _failed;

        public TestFetch(IDfSession session, String objectId) {
            _session = session;
            _objectId = objectId;
        }

        @Override
        public void run() {
            try {
                for (int i = 0; i < 100000; i++) {
                    _session.getObject(DfIdUtil.getId(_objectId));
                    _session.flushObject(DfIdUtil.getId(_objectId));
                }
            } catch (DfException ex) {
                ex.printStackTrace();
                _failed = true;
            }
        }
    }

}

which reproduces two different exceptions:

DfException:: THREAD: Thread-9; MSG: [DM_OBJ_MGR_E_UNABLE_TO_FETCH_CONSISTENT_OBJECT_SNAPSHOT]
   error:  "Unable to fetch object 09024be98001f94e alongwith its aspect attributes with consistency"; ERRORCODE: 100; 
NEXT: null
	at com.documentum.fc.client.impl.docbase.DocbaseExceptionMapper.newException(DocbaseExceptionMapper.java:57)
	at com.documentum.fc.client.impl.connection.docbase.MessageEntry.getException(MessageEntry.java:39)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseMessageManager.getException(DocbaseMessageManager.java:137)
	at com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient.checkForMessages(NetwiseDocbaseRpcClient.java:310)
	at com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient.applyForObject(NetwiseDocbaseRpcClient.java:653)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection$8.evaluate(DocbaseConnection.java:1380)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.evaluateRpc(DocbaseConnection.java:1139)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.applyForObject(DocbaseConnection.java:1372)
	at com.documentum.fc.client.impl.docbase.DocbaseApi.parameterizedFetch(DocbaseApi.java:107)
	at com.documentum.fc.client.impl.objectmanager.PersistentDataManager.fetchFromServer(PersistentDataManager.java:191)
	at com.documentum.fc.client.impl.objectmanager.PersistentDataManager.getData(PersistentDataManager.java:82)
	at com.documentum.fc.client.impl.objectmanager.PersistentObjectManager.getObjectFromServer(PersistentObjectManager.java:355)
	at com.documentum.fc.client.impl.objectmanager.PersistentObjectManager.getObject(PersistentObjectManager.java:311)
	at com.documentum.fc.client.impl.session.Session.getObject(Session.java:963)
	at com.documentum.fc.client.impl.session.SessionHandle.getObject(SessionHandle.java:653)

and

DfIdNotFoundException:: THREAD: Thread-9; MSG: [DM_API_E_EXIST]
  error:  "Document/object specified by 09024be98001f94e does not exist."; ERRORCODE: 100; 
NEXT: DfException:: THREAD: Thread-9; MSG: [DM_OBJ_MGR_E_UNABLE_TO_FETCH_CONSISTENT_OBJECT_SNAPSHOT]
  error:  "Unable to fetch object 09024be98001f94e alongwith its aspect attributes with consistency"; ERRORCODE: 100; 
NEXT: null
	at com.documentum.fc.client.impl.docbase.DocbaseExceptionMapper.newException(DocbaseExceptionMapper.java:49)
	at com.documentum.fc.client.impl.connection.docbase.MessageEntry.getException(MessageEntry.java:39)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseMessageManager.getException(DocbaseMessageManager.java:137)
	at com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient.checkForMessages(NetwiseDocbaseRpcClient.java:310)
	at com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient.applyForObject(NetwiseDocbaseRpcClient.java:653)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection$8.evaluate(DocbaseConnection.java:1380)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.evaluateRpc(DocbaseConnection.java:1139)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.applyForObject(DocbaseConnection.java:1372)
	at com.documentum.fc.client.impl.docbase.DocbaseApi.parameterizedFetch(DocbaseApi.java:107)
	at com.documentum.fc.client.impl.objectmanager.PersistentDataManager.fetchFromServer(PersistentDataManager.java:191)
	at com.documentum.fc.client.impl.objectmanager.PersistentDataManager.getData(PersistentDataManager.java:82)
	at com.documentum.fc.client.impl.objectmanager.PersistentObjectManager.getObjectFromServer(PersistentObjectManager.java:355)
	at com.documentum.fc.client.impl.objectmanager.PersistentObjectManager.getObject(PersistentObjectManager.java:311)
	at com.documentum.fc.client.impl.session.Session.getObject(Session.java:963)
	at com.documentum.fc.client.impl.session.SessionHandle.getObject(SessionHandle.java:653)
Caused by: DfException:: THREAD: Thread-9; MSG: [DM_SYSOBJECT_E_CANT_FETCH_INVALID_ID]
  error:  "Cannot fetch a sysobject - Invalid object ID : 09024be98001f94e"; ERRORCODE: 100; 
NEXT: null
	... 19 more
Followed by: DfException:: THREAD: Thread-9; MSG: [DM_OBJ_MGR_E_UNABLE_TO_FETCH_CONSISTENT_OBJECT_SNAPSHOT]
  error:  "Unable to fetch object 09024be98001f94e alongwith its aspect attributes with consistency"; ERRORCODE: 100; 
NEXT: null
	at com.documentum.fc.client.impl.docbase.DocbaseExceptionMapper.newException(DocbaseExceptionMapper.java:57)
	at com.documentum.fc.client.impl.connection.docbase.MessageEntry.getException(MessageEntry.java:39)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseMessageManager.getException(DocbaseMessageManager.java:137)
	at com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient.checkForMessages(NetwiseDocbaseRpcClient.java:310)
	at com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient.applyForObject(NetwiseDocbaseRpcClient.java:653)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection$8.evaluate(DocbaseConnection.java:1380)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.evaluateRpc(DocbaseConnection.java:1139)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.applyForObject(DocbaseConnection.java:1372)
	at com.documentum.fc.client.impl.docbase.DocbaseApi.parameterizedFetch(DocbaseApi.java:107)
	at com.documentum.fc.client.impl.objectmanager.PersistentDataManager.fetchFromServer(PersistentDataManager.java:191)
	at com.documentum.fc.client.impl.objectmanager.PersistentDataManager.getData(PersistentDataManager.java:82)
	at com.documentum.fc.client.impl.objectmanager.PersistentObjectManager.getObjectFromServer(PersistentObjectManager.java:355)
	at com.documentum.fc.client.impl.objectmanager.PersistentObjectManager.getObject(PersistentObjectManager.java:311)
	at com.documentum.fc.client.impl.session.Session.getObject(Session.java:963)
	at com.documentum.fc.client.impl.session.SessionHandle.getObject(SessionHandle.java:653)

The last one is extremely weird because DFC hides the real error under DfIdNotFoundException. Now let’s check what Content Server thinks about DM_OBJ_MGR_E_UNABLE_TO_FETCH_CONSISTENT_OBJECT_SNAPSHOT error:

SEVERITY: ERROR
DESCRIPTION: D Unable to fetch object %s alongwith its aspect attributes with consistency
CAUSE: This error occurs when a client fetches an object that has aspect attributes, and that either the normal or aspect attributes have changed perhaps by a different session while Content Server was fetching the aspect attributes after fetching the normal attributes from the database.
ACTION: Application should refetch the object.
PARAMS:

Why does the fucking DFC do not reiterate fetch after DM_OBJ_MGR_E_UNABLE_TO_FETCH_CONSISTENT_OBJECT_SNAPSHOT error?

Encryption madness. Part IV

Interesting, dm_crypto_create utility is able to create AES128, AES192 and AES256 encryption keys, note -algorithm argument:

[dmadmin@docu72dev01 ~]$ dm_crypto_create --help
Option --help has no meaning !!
Usage: dm_crypto_db_create [-lockbox <lockbox>] [-lockboxpassphrase [lockboxpassphrase]] 
                           [-keyname <keyname>] [-location <location>] 
                           [-passphrase [passphrase]] [-noprompt] [-move] 
                           [-check] [-algorithm] [-help] 

  -lockbox  - Optional lockbox to store administration key.
  -lockboxpassphrase  - Passphrase for the lockbox.
               Should be specified for a new lockbox.
  -keyname  - Optional name of the administration key to be stored in lockbox.
  -location  - Optional location of the administration key file.
  -passphrase  - Optional passphrase to protect the administration key.
               If not specified a default passphrase will be used.
  -move     - Optional argument to move existing key into lockbox.

  -check     - Optional argument to check for an existing administration key at
               the specified location.

  -algorithm   - Optional argument to specify the algorithm used for generating key.
                Valid values are AES_128_CBC, AES_192_CBC and AES_256_CBC. 
  -noprompt  - Optional flag to not prompt for default passphrase confirmation.
  -help      - Print this help/usage information.

** If -location is not specified, environment variable 
   DM_CRYPTO_FILE or DOCUMENTUM will be used to determine the location.
** -passphrase and -noprompt options are exclusive.

** It is suggested that you specify a passphrase **

but DFC recognises AES128 keys only – AES256 fails with “PBKDF2WithHmacSHA256 SecretKeyFactory not available” error (this factory is available since Java8 only):

 ~]$ dm_crypto_create -location /tmp/aek.key -noprompt -algorithm AES_256_CBC

** Will use default passphrase **



Please wait. This will take a few seconds ...
** Successfully created key store /tmp/aek.key using algorithm AES_256_CBC
[dmadmin@docu72dev01 ~]$ iapi DCTM_DEV
Please enter a user (dmadmin): 
Please enter password for dmadmin: 


        EMC Documentum iapi - Interactive API interface
        (c) Copyright EMC Corp., 1992 - 2015
        All rights reserved.
        Client Library Release 7.2.0030.0072


Connecting to Server using docbase DCTM_DEV
[DM_SESSION_I_SESSION_START]info:  "Session 01024be98000746a started for user dmadmin."


Connected to Documentum Server running Release 7.2.0030.0195  Linux64.Oracle
Session id is s0
API> initcrypto,c,/tmp/aek.key
...
Unexpected problem with key generation: PBKDF2WithHmacSHA256 SecretKeyFactory not available


API> 

AES192 fails with “Unsupported algorithm ID: 4” error:

 ~]$ dm_crypto_create -location /tmp/aek.key -noprompt -algorithm AES_192_CBC

** Will use default passphrase **



Please wait. This will take a few seconds ...
** Successfully created key store /tmp/aek.key using algorithm AES_192_CBC
[dmadmin@docu72dev01 ~]$ iapi DCTM_DEV
Please enter a user (dmadmin): 
Please enter password for dmadmin: 


        EMC Documentum iapi - Interactive API interface
        (c) Copyright EMC Corp., 1992 - 2015
        All rights reserved.
        Client Library Release 7.2.0030.0072


Connecting to Server using docbase DCTM_DEV
[DM_SESSION_I_SESSION_START]info:  "Session 01024be98000746d started for user dmadmin."


Connected to Documentum Server running Release 7.2.0030.0195  Linux64.Oracle
Session id is s0
API> initcrypto,c,/tmp/aek.key
...
Unsupported algorithm ID: 4


API> 

I have checked DFC and found out that EMC made mess not only of encryption algorithm names on CS side – their understanding of cryptography on java side is also extremely poor.

Diagnostics challenge. Part II

– Tony, there is a man I’d like you to find.
– Well, that depends on all the elements in the equation. How many are there?
– Forty thousand.

Yeah, in case of Documentum there are too many elements it the equation, and all customer teams I have seen before fails to troubleshoot performance issues – in most cases they try to ask somebody else for help, and the most stupid idea here is to ask the vendor.

Well, imagine you have identified that some Documentum activities got stuck due to a lock in database, i.e. from database perspective you are observing something like this:

what will be the next step in your troubleshooting investigation? Some guys think that it is a good idea to kill blocking session or bounce everything ๐Ÿ™‚ This is wrong! At first we need to understand where has the blocking session came from and what is it currently doing. In order to do so I have written a simple SQL query which may give some clues for the problem:

Now we know outgoing TCP port for Documentum process and can identify documentum session and corresponding java application:

Now we know where blocking session has came from, how to identify corresponding execution stack? The simplest way is to make a dump of heap, find there com.documentum.fc.client.impl.session.Session objects and compare their m_connectionId with documentum session identifier found previosely, m_transationThread is what we are looking for:

but for web applications I have written a simple jsp which do the same in a more simple manner:

Enjoy ๐Ÿ™‚

Diagnostics challenge. Part I

Yesterday a former customer asked me to help with a performance issuePERFORMANCE ISSUE – database was suffering from a lot of locks:

and the most “popular” SQL statement was:

update dm_sysobject_s set i_vstamp=i_vstamp where r_object_id=:"SYS_B_0" and i_vstamp =:"SYS_B_1"

How to diagnose such issues in Documentum environment? At first, we need to know what java code produces such SQL update statement, let gather this information using tracing facilities:

[DM_SESSION_I_SESSION_START]info:  "Session 01024be980006909 started for user dmadmin."


Connected to Documentum Server running Release 7.2.0030.0195  Linux64.Oracle
Session id is s0
API> apply,c,,SQL_TRACE,SESSION_ID,S,01024be980006909,LEVEL,I,10
...  
q0   
API> apply,c,,LOG_ON,DETAIL,B,T
...
q1
API> create,c,dm_document
...
09024be980029502
API> save,c,l
...
OK
API> set,c,l,object_name
SET> xxx
...
OK
API> save,c,l
...
OK
API> checkout,c,l
...
09024be980029502
API> unlock,c,l
...
OK
API> begintran,c
...
OK
API> lock,c,l
...
OK
API> abort,c,
...
OK
API> begintran,c
...
OK
API> lock,c,l,T
...
OK
API> abort,c,l

and gathered SQL patterns are following:

com.documentum.fc.client.IDfPersistentObject#save:

UPDATE DM_SYSOBJECT_S dm_dbalias_B
   SET OBJECT_NAME = 'xxx',
       R_MODIFY_DATE =
          TO_DATE ('2016/11/03.00.21.49', 'YYYY/MM/DD.HH24.MI.SS'),
       R_MODIFIER = 'dmadmin',
       R_LOCK_OWNER = ' ',
       R_LOCK_DATE = TO_DATE ('0001/01/01.00.00.00', 'YYYY/MM/DD.HH24.MI.SS'),
       R_LOCK_MACHINE = ' ',
       I_VSTAMP = 1
 WHERE (    dm_dbalias_B.R_OBJECT_ID = :dmb_handle
        AND dm_dbalias_B.I_VSTAMP = :dmb_version);

com.documentum.fc.client.IDfSysObject#checkout:

UPDATE DM_SYSOBJECT_S dm_dbalias_B
   SET R_LOCK_OWNER = 'dmadmin',
       R_LOCK_DATE = TO_DATE ('2016/11/03.00.21.56', 'YYYY/MM/DD.HH24.MI.SS'),
       R_LOCK_MACHINE = 'docu72dev01',
       I_VSTAMP = 2
 WHERE (    dm_dbalias_B.R_OBJECT_ID = :dmb_handle
        AND dm_dbalias_B.I_VSTAMP = :dmb_version);

com.documentum.fc.client.IDfSysObject#cancelCheckout:

UPDATE DM_SYSOBJECT_S dm_dbalias_B
   SET R_LOCK_OWNER = ' ',
       R_LOCK_DATE = TO_DATE ('0001/01/01.00.00.00', 'YYYY/MM/DD.HH24.MI.SS'),
       R_LOCK_MACHINE = ' ',
       I_VSTAMP = 3
 WHERE (    dm_dbalias_B.R_OBJECT_ID = :dmb_handle
        AND dm_dbalias_B.I_VSTAMP = :dmb_version);

com.documentum.fc.client.IDfPersistentObject#lock:

UPDATE dm_sysobject_s
   SET i_vstamp = i_vstamp
 WHERE r_object_id = '09024be980029502';

com.documentum.fc.client.IDfPersistentObject#lockEx(true):

UPDATE dm_sysobject_s
   SET i_vstamp = i_vstamp
 WHERE r_object_id = '09024be980029502' AND i_vstamp = 3;

Now, it is clear that stuck SQL updates relate to IDfPersistentObject#lockEx(true) call, but it is only the part of the problem – when customer was putting IDfPersistentObject#lockEx(true) calls into his code he was expecting concurrency issues in certain places of code, so the fact that IDfPersistentObject#lockEx(true) calls get stuck is not a root cause – it is an expected behaviour, the problem is to investigate why these calls get stuck for a long time.

PS.

I can’t understand the purpose of IDfPersistentObject#lockEx(true) call – from programming perspective it’s behaviour is extremely weird:

S#1> begintran,c,
...
OK
S#1> save,c,09024be980029502
...
OK
S#2> begintran,c,
...
OK
S#2> lock,c,09024be980029502,T
...
S#1> commit,c
...
OK

S#2 [DM_OBJ_MGR_E_VERSION_MISMATCH]error:  "save of object 09024be980029502 
    of type dm_document failed because of version mismatch: old version was 3"

S#2> lock,c,09024be980029502,T
...
[DM_OBJ_MGR_E_VERSION_MISMATCH]error:  "save of object 09024be980029502 
     of type dm_document failed because of version mismatch: old version was 5"


S#2> revert,c,09024be980029502
...
OK
S#2> lock,c,09024be980029502,T
...
OK

So, the corresponding java code would be:

public static void lock(IDfPersistentObject object) throws DfException {
	while (true) {
		object.revert();
		try {
			object.lockEx(true);
			break;
		} catch (DfException ex) {
			if (!"DM_OBJ_MGR_E_VERSION_MISMATCH".equals(ex.getMessageId())) {
				throw ex;
			}
		}
	}
}

why do not write it in more straightforward way (the most suitable ways is described in Another way to implement โ€œcontrolledโ€ database lock):

public static void lock(IDfPersistentObject object) throws DfException {
	object.lock();
	object.revert();
}

?

Dealing with workflow methods. Part II

Well, relying to previously posted diagram, what are the main problems in implementation of workflow engine in Documentum? Timeouts and error handling! Let’s explain why.

Actually, I have no idea what EMC was doing all that time, but current implementation of workflow engine is completely unreliable – workflow agent manages the execution of automatic activities in extremely odd way: it just sends http-requests to JMS and waits for response, in case of timeout it pauses the execution of workflow but meanwhile JMS continues to execute automatic task and sooner or later you will get something like:

DfException:: THREAD: http-0.0.0.0-9080-1; MSG: [DM_WORKFLOW_E_ACTION_NOT_ALLOWED]error:  "This operation is not allowed when the state is 'finished' for workitem '4a0011ec8004f500'."; ERRORCODE: 100; NEXT: null
    at com.documentum.fc.client.impl.docbase.DocbaseExceptionMapper.newException(DocbaseExceptionMapper.java:57)
    at com.documentum.fc.client.impl.connection.docbase.MessageEntry.getException(MessageEntry.java:39)
    at com.documentum.fc.client.impl.connection.docbase.DocbaseMessageManager.getException(DocbaseMessageManager.java:137)
    at com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient.checkForMessages(NetwiseDocbaseRpcClient.java:310)
    at com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient.applyForBool(NetwiseDocbaseRpcClient.java:354)
    at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection$1.evaluate(DocbaseConnection.java:1151)
    at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.evaluateRpc(DocbaseConnection.java:1085)
    at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.applyForBool(DocbaseConnection.java:1144)
    at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.apply(DocbaseConnection.java:1129)
    at com.documentum.fc.client.impl.docbase.DocbaseApi.witemComplete(DocbaseApi.java:1193)
    at com.documentum.fc.client.DfWorkitem.completeEx2(DfWorkitem.java:505)
    at com.documentum.fc.client.DfWorkitem.completeEx(DfWorkitem.java:499)
    at com.documentum.bpm.DfWorkitemEx___PROXY.completeEx(DfWorkitemEx___PROXY.java)

such errors are extremely painful because before restarting failed workflow activities you always need to investigate whether you are actually need to re-execute activity’s body or not, i.e. if auto-activity get failed due to timeout and it’s body does something like i=i+i you will get wrong data upon restart. And it is not a joke, when restarting failed auto-activities you can specify wether it is required to execute activity’s body or not – webtop does allow to perform such thing:

there is just a mistake in API reference manual:

in order to skip execution of activity’s body you need to perform something like:

API> fetch,c,4a024be980001502
...
OK
API> get,c,l,r_runtime_state
...
5
API> get,c,l,r_act_seqno
...
0
API> get,c,l,r_workflow_id
...
4d024be980001101
-- this places auto-activity into 
-- DM_INTERNAL_MANUAL_COMPLETE queue
-- and workflow agent won't pick it up
API> restart,c,4d024be980001101,0,T
...
OK
API> revert,c,4a024be980001502
...
OK
API> get,c,l,a_wq_name
...
DM_INTERNAL_MANUAL_COMPLETE
API> complete,c,4a024be980001502
...
OK
API> 

So far, so good, now we know how to skip execution of activity’s body, but it is still required to investigate the root cause of why auto-activity got failed. Is it possible to prevent these painful timeouts at all? I do think that timeouts is a design gap in workflow engine because workflow agent is executed not inside JMS context. However, we are forced to work with current odd implementation and try somehow resolve such issues. Typically, java code which servers auto-activity execution looks like:

public final int execute(Map params, PrintWriter printWriter) throws Exception {
	parseArguments(params);
	IDfSession session = null; 
			
	try {
		session = getSession();
		IDfWorkitem workitem = getWorkItem();
		if (workitem.getRuntimeState() == IDfWorkitem.DF_WI_STATE_DORMANT) {
			workitem.acquire();
		}
		
		// perform business logic
		
		workitem.complete();
		
		return 0;
	} finally {
		if (session != null) {
			release(session);
		}
	}
}

but the correct one is:

public final int execute(Map params, PrintWriter printWriter) throws Exception {
	parseArguments(params);
	IDfSession session = null;

	try {
		session = getSession();
		session.beginTrans();
		IDfWorkitem workitem = getWorkItem();
		if (workitem.getRuntimeState() == IDfWorkitem.DF_WI_STATE_DORMANT) {
			// this puts exclusive lock on workitem
			// in underlying database and prevents
			// workflow agent from pausing workitem
			workitem.acquire();
		} else if (workitem.getRuntimeState() == IDfWorkitem.DF_WI_STATE_ACQUIRED) {
			// in case of restart workitem state is already
			// acquired, so, we are unable to call acquire,
			// but still need to put exclusive lock in database
			workitem.lock();
		} else {
			throw new DfException("Invalid workitem state");
		}

		// perform business logic

		workitem.complete();
		session.commitTrans();

		return 0;
	} finally {
		if (session != null) {
			if (session.isTransactionActive()) {
				session.abortTrans();
			}
			release(session);
		}
	}
}

next challenge is error handling. The problem is when we are dealing with Documentum we may face with a lot of weird errors, and some of these errors are soft (for example, DM_SYSOBJECT_E_VERSION_MISMATCH) – in order to resolve such errors we just need to reiterate the execution of code, others are not – we need to investigate the root cause, and it is a good idea in case of soft errors restart failed auto-activities automatically, so I invented following pattern:

@Override
public final int execute(Map params, PrintWriter printWriter) throws Exception {
	parseArguments(params);
	IDfSession session = null;
	IDfWorkitem workitem = null;
	try {
		try {
			session = getSession();
			session.beginTrans();
			workitem = getWorkItem();
			if (workitem.getRuntimeState() == IDfWorkitem.DF_WI_STATE_DORMANT) {
				// this puts exclusive lock on workitem
				// in underlying database and prevents
				// workflow agent from pausing workitem
				workitem.acquire();
			} else if (workitem.getRuntimeState() == IDfWorkitem.DF_WI_STATE_ACQUIRED) {
				// in case of restart workitem state is dormant
				// so, we are unable to call acquire, but still
				// need to put exclusive lock in database
				workitem.lock();
			} else {
				throw new DfException("Invalid workitem state");
			}

			// perform business logic

			if (isSomethingWrong()) {
				haltWorkitem(workitem);
				session.commitTrans();
				return 0;
			}

			workitem.complete();
			session.commitTrans();

			return 0;
		} finally {
			if (session.isTransactionActive()) {
				session.abortTrans();
			}
		}
	} catch (DfException ex) {
		if (!isSoftException(ex)) {
			throw ex;
		}
		haltWorkitem(workitem);
		return 0;
	}
}

protected void haltWorkitem(IDfWorkitem workitem) throws DfException {
	IDfSession session = workitem.getSession();
	IDfWorkflow workflow = (IDfWorkflow) session.getObject(workitem.getWorkflowId());
	// here transaction may be already inactive
	boolean txStartsHere = !session.isTransactionActive();
	try {
		// we need to start new transaction
		// in order to lock workitem
		if (txStartsHere) {
			session.beginTrans();
		}
		// exclusive access to workitem
		workitem.lock();
		workitem.revert();
		// restarting workitem - we are in transaction,
		// so workflow agent won't pickup it
		// actually we need to check both workitem
		// and workflow states
		workflow.restart(workitem.getActSeqno());
		workitem.revert();
		// let dm_WFSuspendTimer job to restart
		// our workitem
		workflow.haltEx(workitem.getActSeqno(), 1);
		if (txStartsHere) {
			session.commitTrans();
			txStartsHere = false;
		}
	} finally {
		if (txStartsHere && session.isTransactionActive()) {
			session.abortTrans();
		}
	}
}

Encryption madness. Holy crap

Well, it started about month ago with the following case:

My customer got an idea that it would be great to adopt CMIS for some integration routines, because CMIS is a standard, has OOTB implementation, etc, etc, etc… Well, if CMIS is a standard try it on your own and don’t bother me… but no suck luck – all DFC-based applications seem to be specially designed to do not work properly out-of-the-box. The CMIS challenge was the following: customer has about 3000 CMIS clients which within a short period of time load a couple of documents into Documentum repository, customer expected about 100-200 simultaneous connections, which is not a big deal for Content Server, but, as it turned out, 100-200 simultaneous connections it is a big challenge for CMIS. The problem is that CMIS considers all requests which contain the same credentials (i.e. login and password are the same for all requests) as requests from the same client, and for all requests containing the same credentials CMIS uses the only one repository session. Here’s an interesting point: three yeas ago EMC introduced “Intelligent Session Management” feature (see also Ask the Expert: Features + Benefits of EMC Documentum D7 and xCP 2.0 Favourite Thing #20: Powered by D7) which was intended to somehow improve session management, but even now a lot of DFC applications use its own square wheels, which cause a lot of pain in behind.

Well, what do we need to do if using the same credentials across across CMIS clients causes performance issues? I was considering two options:

  1. create dedicated account for each CMIS client – obvious, but requires to change permission logic as well
  2. take advantage of “encrypted” passwords – it allows to “generate different passwords”:
    API> encryptpass,c,dmadmin
    ...
    DM_ENCR_PASS_V2=AAAAEMdecoR51kuUZPXibPRM3a33NpNcQeCzwwqjlWtP8VV3gzMEDUdarrP12Ti0KbNDJSTegm70gbVJrOypK6qiFHA=
    API> encryptpass,c,dmadmin
    ...
    DM_ENCR_PASS_V2=AAAAEJIEZY3ZCuzJRBw1lrKifaoosOe7MjU4XuySVQC1J2fbfK6IpGDH7XOw/aNJd/raPU0vfoIH1+zRE+3vkfzrVBw=
    API> encryptpass,c,dmadmin
    ...
    DM_ENCR_PASS_V2=AAAAECcfEKMv8bsLZ5UIQy/0fmyKqVxy5OL/Yk+Nn9TEhRRnqM/7zJqWtbTMfDeVMk6MaIWudNFcFmSSJ/u6C8M0HZo=
    API> encryptpass,c,dmadmin
    ...
    DM_ENCR_PASS_V2=AAAAELFBEgQ1vVaJ0rJ1y4dMOQVgkFRQSCxC91IOkmYWKg5lb/cR3f0E3onkye75QsrF1AtV2GoTGvCCssRyPItU7as=
    

I had chosen the second one a got a lot of fun. Let’s explain.

At first, in order to confirm my concerns about performance, I had created a test plan in Apache JMeter, after that I generated a set of encrypted passwords. If you have no idea how “encrypted passwords” feature should work I recommend to read Content Server API Reference Manual pages 108 and 159 to make sure that this feature was documented. The general idea is following:

  • We create a brand new aek.key file on CS side:
    [dmadmin@docu72dev01 ~]$ dm_crypto_create -location /tmp/aek.key -noprompt
    
    ** Will use default passphrase **
    
    
    ** Will use default algorithm **
    
    Please wait. This will take a few seconds ...
    Key - aek.key uses algorithm AES_128_CBC. 
    
    ** A key store already exists under /tmp/aek.key
    
  • After that we can generate encrypted passwords using initcrypto and encryptpass commands:
    Connected to Documentum Server running Release 7.2.0030.0195  Linux64.Oracle
    Session id is s0
    API> initcrypto,c,/tmp/aek.key
    ...
    OK
    API> encryptpass,c,test_password
    ...
    DM_ENCR_PASS_V2=AAAAEDAafp9V6OXlKarZHoN4k....
    API>      
    
  • in order to take advantage of encrypted passwords on application side, we need to transfer previously generated aek.key file on application’s filesystem and start application with “DM_CRYPTO_FILE” environment variable pointing to location of aek.key file

And only after a series of tests I had realised that I had missed two important steps (frankly speaking, I was bothered more by visualising results in Apache JMeter):

  • call initcrypto before generating passwords
  • copy aek.key onto CMIS filesystem

meanwhile Apache JMeter was working smoothly! What the hell is going on? Actually, I already mentioned that in Documentum 7.2 EMC had performed a couple of weird changes related to security: Encryption madness, Encryption madness. Part II, also month ago Yuri Simione discovered another funny behaviour in Documentum 7.2: EMC Documentum 7.2 migration and Murphy’s laws, and this is another one “security” feature of Documentum 7.2: now Content Server accepts encrypted password as they were plain text!