What is wrong in Documentum. Part II

At first, I would like to thank Dave for clarifying TSG’s points, but one his statement has got me stumped:

We would see it as very difficult for Documentum to hire new engineers for an old code base. That being said, there are C developers available but they are different from the Java engineers.

Let’s forget about Content Sever and it’s C/C++ codebase and start to talk about DFC – it is written in Java and, according to TSG, DFC was rewritten from scratch some time ago, so there are no doubts that DFC codebase is “modern”.

14 months ago I found that DFC has some issues with long running transactions due to a memory leak. For example code below clearly demonstrates a memory leak in DFC:

public static void main(String[] args) throws DfException {
    IDfSession session = new DfClientX().getLocalClient().newSession(
            "repo", new DfLoginInfo("test01", "test01"));
    session.beginTrans();
    for (int i = 0; i < 100; i++) {
        IDfSysObject object = (IDfSysObject) session
                .newObject("dm_document");
        object.save();
    }
    System.out.println("Debugger breakpoint");
    session.commitTrans();
}

Debugger view:

i.e. DFC has some internal structure (m_objectsToRevertOnAbort hashmap) where it stores all objects modified/created in transaction. What is a purpose of this structure? When DFC rollbacks transaction it reads all modified objects from this structure and reverts (i.e. refetches form Content Server) each object to make DFC-cache consistent (i.e. in case of rollback DFC tries to revert DFC-cache to the state “prior” to the start of the transaction). What pitfalls are there?

  1. It’s obvious that such reverting algorithm is extremely slow by design – when aborting transaction DFC reverts every object individually, however, transaction rollback is not a common behaviour pattern of DFC-application, so instead of reverting each object it’s enough to flush DFC-cache
  2. It’s obvious too that such algorithm consumes a lot of memory (actually this is a memory leak), and EMC had confirmed this fact – check DFC-11950 in DFC release notes
  3. And finally, such reverting algorithm causes undefined behaviour in some cases, for example, this code:
    public static void main(String[] args) throws DfException {
        IDfSession session = new DfClientX().getLocalClient().newSession(
                "repo", new DfLoginInfo("test01", "test01"));
        String id;
        session.beginTrans();
        IDfQuery q = new DfQuery(
                "create dm_document object set object_name='test'");
        IDfCollection coll = q.execute(session, IDfQuery.DF_EXEC_QUERY);
        coll.next();
        id = coll.getString("object_created");
        session.abortTrans();
        IDfSysObject object = (IDfSysObject) session.getObject(DfId.valueOf(id));
    }
    

    throws DfIdNotFoundException, but adding just one line to that code causes it to throw DM_OBJ_MGR_E_FETCH_FAIL :

    public static void main(String[] args) throws DfException {
        IDfSession session = new DfClientX().getLocalClient().newSession(
                "repo", new DfLoginInfo("test01", "test01"));
        String id;
        session.beginTrans();
        IDfQuery q = new DfQuery(
                "create dm_document object set object_name='test'");
        IDfCollection coll = q.execute(session, IDfQuery.DF_EXEC_QUERY);
        coll.next();
        id = coll.getString("object_created");
        IDfSysObject tmp = (IDfSysObject) session.getObject(DfId.valueOf(id));
        session.abortTrans();
        IDfSysObject object = (IDfSysObject) session
                .getObject(DfId.valueOf(id));
    }

What was done in DFC-11950? Let’s check:

So, EMC developers didn’t understand the point about DFC-cache flushing and just replaced persistent objects by their ids. Let’s check the new behavior using another test-case – now I’m trying to update objects in transaction:

public static void main(String[] args) throws DfException {
    IDfSession session = new DfClientX().getLocalClient().newSession(
            "repo", new DfLoginInfo("test01", "test01"));
    List<String> ids = new ArrayList<String>();
    for (int i = 0; i < 100; i++) {
        IDfSysObject object = (IDfSysObject) session
                .newObject("dm_document");
        object.save();
        ids.add(new String(object.getObjectId().getId()));
    }
    session.beginTrans();
    for (String id : ids) {
        IDfSysObject object = (IDfSysObject) session.getObject(DfId
                .valueOf(id));
        object.save();
    }
    System.out.println("Debugger breakpoint");
    session.commitTrans();
}

Debugger:

So, every DfId in m_objectIdsToRevertOnAbort is backed by String, which is backed by char array having 1378 characters (actually, I’m updating empty objects, and, I think, the space consumed by DfId would be about 10-20K for real objects, i.e. about 50000 objects per 1Gb).

After that EMC started to discuss an ability to replace DfIds in m_objectIdsToRevertOnAbort by Strings (they still do not get a point about DFC-cache flushing), and today I have received a weird explanation from support:

Fuck yeah! DfId.valueOf(new String(object.getObjectId().getId())) is extremely slow, but object.revert() is extremely fast!

Codebase is not a problem at all.

3 thoughts on “What is wrong in Documentum. Part II

  1. > So, EMC developers didn’t understand the point about DFC-cache flushing and just replaced persistent objects by their ids. L
    is it so simple? flush DFC cache for all failed transactions instead of reverting the affected objects? irrespective of if the DFC has few objects or millions of cached objects?

    and EMC had to implement a hash table to keep track of modified objects rather than just say session->flush=true

    Like

  2. – DFC is unable to cache a lot of objects because cache is backed by soft references.
    – it’s absolutely normal and correct to flush session cache upon transaction rollback – transaction rollback is an exceptional situation, and in general, applications do not rollback transactions, what to argue about different application profiles – ok, create new setting in dfc.properties
    – ECM developers do think that new String(string) has a performance impact, but object.revert() – not, this is ridiculous.

    Like

  3. Andrey – good post, thanks. As to Dave’s quote that has you stumped – I think he’s talking about the codebase for the whole stack and how much of it is still C. As mentioned in this post you referenced: http://blog.tsgrp.com/2011/06/13/documentum-foundation-services-%E2%80%93-what-happened/, the DFC has been all Java since D6.5. However, the content server still uses legacy C code that has been around for a long time and the DMCL needs to be emulated for backwards compatibility. Not to mention the horror of writing dm_basic scripts, which you mentioned in your part III post – I totally agree with you there. So, I think Dave’s question is – now that most young programmers are coding in Java, Javascript, Python, Ruby, and others – do they really want to work with something like Documentum with all of the C code, DMCL emulation, dm_basic, etc? Not really an answerable question, but something to think about when it comes to innovation in future releases.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s