Weird OOM

Yesterday I faced with java.lang.OutOfMemoryError which was unknown even to google and looked like:

java.lang.OutOfMemoryError: (class: org/apache/fop/text/linebreak/LineBreakUtils, method: init1 signature: ()V) 
	at org.apache.fop.text.linebreak.LineBreakStatus.nextChar(LineBreakStatus.java:86)
	at org.apache.fop.layoutmgr.inline.TextLayoutManager.getNextKnuthElements(TextLayoutManager.java:772)
	at org.apache.fop.layoutmgr.inline.LineLayoutManager.collectInlineKnuthElements(LineLayoutManager.java:700)
	at org.apache.fop.layoutmgr.inline.LineLayoutManager.getNextKnuthElements(LineLayoutManager.java:629)
	at org.apache.fop.layoutmgr.BlockLayoutManager.getNextChildElements(BlockLayoutManager.java:141)
	at org.apache.fop.layoutmgr.BlockStackingLayoutManager.getNextKnuthElements(BlockStackingLayoutManager.java:289)
	at org.apache.fop.layoutmgr.BlockLayoutManager.getNextKnuthElements(BlockLayoutManager.java:113)
	at org.apache.fop.layoutmgr.BlockLayoutManager.getNextKnuthElements(BlockLayoutManager.java:105)
	at org.apache.fop.layoutmgr.table.TableCellLayoutManager.getNextKnuthElements(TableCellLayoutManager.java:191)
	at org.apache.fop.layoutmgr.table.RowGroupLayoutManager.createElementsForRowGroup(RowGroupLayoutManager.java:120)
	at org.apache.fop.layoutmgr.table.RowGroupLayoutManager.getNextKnuthElements(RowGroupLayoutManager.java:63)
	at org.apache.fop.layoutmgr.table.TableContentLayoutManager.getKnuthElementsForRowIterator(TableContentLayoutManager.java:270)
	at org.apache.fop.layoutmgr.table.TableContentLayoutManager.getNextKnuthElements(TableContentLayoutManager.java:212)
	at org.apache.fop.layoutmgr.table.TableLayoutManager.getNextKnuthElements(TableLayoutManager.java:273)
	at org.apache.fop.layoutmgr.FlowLayoutManager.getNextChildElements(FlowLayoutManager.java:223)
	at org.apache.fop.layoutmgr.FlowLayoutManager.addChildElements(FlowLayoutManager.java:147)
	at org.apache.fop.layoutmgr.FlowLayoutManager.getNextKnuthElements(FlowLayoutManager.java:116)
	at org.apache.fop.layoutmgr.FlowLayoutManager.getNextKnuthElements(FlowLayoutManager.java:69)
	at org.apache.fop.layoutmgr.PageBreaker.getNextKnuthElements(PageBreaker.java:252)
	at org.apache.fop.layoutmgr.AbstractBreaker.getNextBlockList(AbstractBreaker.java:643)
	at org.apache.fop.layoutmgr.PageBreaker.getNextBlockList(PageBreaker.java:178)
	at org.apache.fop.layoutmgr.PageBreaker.getNextBlockList(PageBreaker.java:158)
	at org.apache.fop.layoutmgr.AbstractBreaker.doLayout(AbstractBreaker.java:384)
	at org.apache.fop.layoutmgr.PageBreaker.doLayout(PageBreaker.java:112)
	at org.apache.fop.layoutmgr.PageSequenceLayoutManager.activateLayout(PageSequenceLayoutManager.java:138)
	at org.apache.fop.area.AreaTreeHandler.endPageSequence(AreaTreeHandler.java:267)
	at org.apache.fop.fo.pagination.PageSequence.endOfNode(PageSequence.java:130)
	at org.apache.fop.fo.FOTreeBuilder$MainFOHandler.endElement(FOTreeBuilder.java:360)
	at org.apache.fop.fo.FOTreeBuilder.endElement(FOTreeBuilder.java:190)
	at org.apache.xalan.transformer.TransformerIdentityImpl.endElement(TransformerIdentityImpl.java:1101)
	at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
	at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
	at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
	at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
	at org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:484)

Actually, Oracle provides troubleshooting guides for all versions of JDK, however I was unable to find the same pattern in their docs (Meaning of OutOfMemoryError – JDK7, Meaning of OutOfMemoryError – JDK6).

The answer for this puzzle is following: such OOM is raised by CCout_of_memory function in check_code.c and relates to native memory.

Dealing with workflow methods. Part II

Well, relying to previously posted diagram, what are the main problems in implementation of workflow engine in Documentum? Timeouts and error handling! Let’s explain why.

Actually, I have no idea what EMC was doing all that time, but current implementation of workflow engine is completely unreliable – workflow agent manages the execution of automatic activities in extremely odd way: it just sends http-requests to JMS and waits for response, in case of timeout it pauses the execution of workflow but meanwhile JMS continues to execute automatic task and sooner or later you will get something like:

DfException:: THREAD: http-0.0.0.0-9080-1; MSG: [DM_WORKFLOW_E_ACTION_NOT_ALLOWED]error:  "This operation is not allowed when the state is 'finished' for workitem '4a0011ec8004f500'."; ERRORCODE: 100; NEXT: null
    at com.documentum.fc.client.impl.docbase.DocbaseExceptionMapper.newException(DocbaseExceptionMapper.java:57)
    at com.documentum.fc.client.impl.connection.docbase.MessageEntry.getException(MessageEntry.java:39)
    at com.documentum.fc.client.impl.connection.docbase.DocbaseMessageManager.getException(DocbaseMessageManager.java:137)
    at com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient.checkForMessages(NetwiseDocbaseRpcClient.java:310)
    at com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient.applyForBool(NetwiseDocbaseRpcClient.java:354)
    at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection$1.evaluate(DocbaseConnection.java:1151)
    at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.evaluateRpc(DocbaseConnection.java:1085)
    at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.applyForBool(DocbaseConnection.java:1144)
    at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.apply(DocbaseConnection.java:1129)
    at com.documentum.fc.client.impl.docbase.DocbaseApi.witemComplete(DocbaseApi.java:1193)
    at com.documentum.fc.client.DfWorkitem.completeEx2(DfWorkitem.java:505)
    at com.documentum.fc.client.DfWorkitem.completeEx(DfWorkitem.java:499)
    at com.documentum.bpm.DfWorkitemEx___PROXY.completeEx(DfWorkitemEx___PROXY.java)

such errors are extremely painful because before restarting failed workflow activities you always need to investigate whether you are actually need to re-execute activity’s body or not, i.e. if auto-activity get failed due to timeout and it’s body does something like i=i+i you will get wrong data upon restart. And it is not a joke, when restarting failed auto-activities you can specify wether it is required to execute activity’s body or not – webtop does allow to perform such thing:

there is just a mistake in API reference manual:

in order to skip execution of activity’s body you need to perform something like:

API> fetch,c,4a024be980001502
...
OK
API> get,c,l,r_runtime_state
...
5
API> get,c,l,r_act_seqno
...
0
API> get,c,l,r_workflow_id
...
4d024be980001101
-- this places auto-activity into 
-- DM_INTERNAL_MANUAL_COMPLETE queue
-- and workflow agent won't pick it up
API> restart,c,4d024be980001101,0,T
...
OK
API> revert,c,4a024be980001502
...
OK
API> get,c,l,a_wq_name
...
DM_INTERNAL_MANUAL_COMPLETE
API> complete,c,4a024be980001502
...
OK
API> 

So far, so good, now we know how to skip execution of activity’s body, but it is still required to investigate the root cause of why auto-activity got failed. Is it possible to prevent these painful timeouts at all? I do think that timeouts is a design gap in workflow engine because workflow agent is executed not inside JMS context. However, we are forced to work with current odd implementation and try somehow resolve such issues. Typically, java code which servers auto-activity execution looks like:

public final int execute(Map params, PrintWriter printWriter) throws Exception {
	parseArguments(params);
	IDfSession session = null; 
			
	try {
		session = getSession();
		IDfWorkitem workitem = getWorkItem();
		if (workitem.getRuntimeState() == IDfWorkitem.DF_WI_STATE_DORMANT) {
			workitem.acquire();
		}
		
		// perform business logic
		
		workitem.complete();
		
		return 0;
	} finally {
		if (session != null) {
			release(session);
		}
	}
}

but the correct one is:

public final int execute(Map params, PrintWriter printWriter) throws Exception {
	parseArguments(params);
	IDfSession session = null;

	try {
		session = getSession();
		session.beginTrans();
		IDfWorkitem workitem = getWorkItem();
		if (workitem.getRuntimeState() == IDfWorkitem.DF_WI_STATE_DORMANT) {
			// this puts exclusive lock on workitem
			// in underlying database and prevents
			// workflow agent from pausing workitem
			workitem.acquire();
		} else if (workitem.getRuntimeState() == IDfWorkitem.DF_WI_STATE_ACQUIRED) {
			// in case of restart workitem state is already
			// acquired, so, we are unable to call acquire,
			// but still need to put exclusive lock in database
			workitem.lock();
		} else {
			throw new DfException("Invalid workitem state");
		}

		// perform business logic

		workitem.complete();
		session.commitTrans();

		return 0;
	} finally {
		if (session != null) {
			if (session.isTransactionActive()) {
				session.abortTrans();
			}
			release(session);
		}
	}
}

next challenge is error handling. The problem is when we are dealing with Documentum we may face with a lot of weird errors, and some of these errors are soft (for example, DM_SYSOBJECT_E_VERSION_MISMATCH) – in order to resolve such errors we just need to reiterate the execution of code, others are not – we need to investigate the root cause, and it is a good idea in case of soft errors restart failed auto-activities automatically, so I invented following pattern:

@Override
public final int execute(Map params, PrintWriter printWriter) throws Exception {
	parseArguments(params);
	IDfSession session = null;
	IDfWorkitem workitem = null;
	try {
		try {
			session = getSession();
			session.beginTrans();
			workitem = getWorkItem();
			if (workitem.getRuntimeState() == IDfWorkitem.DF_WI_STATE_DORMANT) {
				// this puts exclusive lock on workitem
				// in underlying database and prevents
				// workflow agent from pausing workitem
				workitem.acquire();
			} else if (workitem.getRuntimeState() == IDfWorkitem.DF_WI_STATE_ACQUIRED) {
				// in case of restart workitem state is dormant
				// so, we are unable to call acquire, but still
				// need to put exclusive lock in database
				workitem.lock();
			} else {
				throw new DfException("Invalid workitem state");
			}

			// perform business logic

			if (isSomethingWrong()) {
				haltWorkitem(workitem);
				session.commitTrans();
				return 0;
			}

			workitem.complete();
			session.commitTrans();

			return 0;
		} finally {
			if (session.isTransactionActive()) {
				session.abortTrans();
			}
		}
	} catch (DfException ex) {
		if (!isSoftException(ex)) {
			throw ex;
		}
		haltWorkitem(workitem);
		return 0;
	}
}

protected void haltWorkitem(IDfWorkitem workitem) throws DfException {
	IDfSession session = workitem.getSession();
	IDfWorkflow workflow = (IDfWorkflow) session.getObject(workitem.getWorkflowId());
	// here transaction may be already inactive
	boolean txStartsHere = !session.isTransactionActive();
	try {
		// we need to start new transaction
		// in order to lock workitem
		if (txStartsHere) {
			session.beginTrans();
		}
		// exclusive access to workitem
		workitem.lock();
		workitem.revert();
		// restarting workitem - we are in transaction,
		// so workflow agent won't pickup it
		// actually we need to check both workitem
		// and workflow states
		workflow.restart(workitem.getActSeqno());
		workitem.revert();
		// let dm_WFSuspendTimer job to restart
		// our workitem
		workflow.haltEx(workitem.getActSeqno(), 1);
		if (txStartsHere) {
			session.commitTrans();
			txStartsHere = false;
		}
	} finally {
		if (txStartsHere && session.isTransactionActive()) {
			session.abortTrans();
		}
	}
}

Dealing with workflow methods. Part I

In next blogpost I’m going to describe some pitfalls related to workflow engine in Documentum, for now you can enjoy nice diagram demonstrating how Content Server processes automatic workflow tasks:

Some facts:

  • Master agent idle time is defined in dm_server_config.wf_sleep_interval
  • Amount of Worker agents is defined in dm_server_config.wf_agent_worker_threads
  • Worker agent idle time is a hardcoded value of 10 seconds
  • Content Server waits 1 second after spawning each Worker agent
  • To start tracing workflow agent issue
    apply,c,,SET_OPTIONS,OPTION,S,trace_workflow_agent,VALUE,B,T

    API command

  • To stop tracing workflow agent issue
    apply,c,,SET_OPTIONS,OPTION,S,trace_workflow_agent,VALUE,B,F

    API command

  • To stop workflow agent issue
    apply,c,,SHUTDOWN_WORKFLOW_AGENT,TIMEOUT,I,<timeout>

    API command

  • To start workflow agent issue
    apply,c,,START_WORKFLOW_AGENT

    API command

  • DQL below displays the amount of auto-activities not yet placed in workflow agent queue:
    SELECT count(r_object_id) AS work_queue_size 
    FROM dmi_workitem
    WHERE r_runtime_state IN (0, 1)
     AND r_auto_method_id > '0000000000000000'
     AND a_wq_name is NULLSTRING
  • DQL below displays the amount of auto-activities placed in workflow agent queue:
    SELECT count(r_object_id) AS work_queue_size 
    FROM dmi_workitem
    WHERE r_runtime_state IN (0, 1)
     AND r_auto_method_id > '0000000000000000'
     AND a_wq_name ='<id of dm_server_config>'

User renaming

Recently I was debugging user renaming procedure (install/admin/userrename.ebs) and realised that EMC’s implementation smells a lot – weird logic related to locked and immutable objects:


  ' Unlock (or only report) sysobjects which are locked by the Old User.
  Call DmPrint(" ", "information")
  Call DmPrint("====== Sysobjects that are locked and have references to user '" & OldUserName & _
               "' =====", "information")
  If(UnlockLckObj = true) then
       Call DmPrint("(all the objects in this list will be unlocked)", _
                    "information")
  Else
       Call DmPrint("(all the objects in this list will remain locked)", _
                    "information")
  End If
  Call DmPrint(" ", "information")

        QueryStr$ = "query,c,select r_object_id from " & _
                                "dm_sysobject (all) where " & _
                                "(owner_name = '" & DqlOldUserName & "' or " & _
                                "r_creator_name = '" & DqlOldUserName & "' or " & _
                                "r_modifier = '" & DqlOldUserName & "' or " & _
                                "acl_domain = '" & DqlOldUserName & "' or " & _
                                "r_lock_owner = '" & DqlOldUserName & "') and " & _
                                "r_lock_owner <> ' '"
  Call DmUpdateSysobj("unlock", QueryStr)

  Call DmRenameDmUserObj()

...

' Update (or only report) sysobjects which are not locked.
  Call DmPrint(" ", "information")
  Call DmPrint("====== Sysobjects referencing user '" & OldUserName & _
               "', which are not locked =====", "information")
  Call DmPrint(" ", "information")

  QueryStr$ = "query,c,select r_object_id from " & _
              "dm_sysobject (all) where " & _
              "(owner_name = '" & DqlOldUserName & "' or " & _
              "r_creator_name = '" & DqlOldUserName & "' or " & _
              "r_modifier = '" & DqlOldUserName & "' or " & _
              "acl_domain = '" & DqlOldUserName & "' or " & _
              "r_lock_owner = '" & DqlOldUserName & "') and " & _
              "r_lock_owner = ' '"
  Call DmUpdateSysobj("", QueryStr)

...

Sub DmUpdateSysobj(unlockOnly As String, QueryStr As String)
  Dim DmQuery As String, ObjectId As String, theModifier As String
  Dim mutable As String, DmObjectType As String, DmOwnerName As String, AttrList As String, DmModifierName As String, DmLockOwner As String
  Dim DmAclDomain As String, DmCreatorName As String
  Dim NumOfObj As Long, ret As Integer
  Dim lockOwner As String
  
  DmQuery = dmAPIGet(QueryStr)
  If DmQuery = "" then
     Call DmPrint("Could not query sysobjects referencing user " & _
                   OldUserName, "fatal")
  End If                
  
  NumOfObj = 0
  While dmAPIExec("next,c," & DmQuery)
     NumOfObj = NumOfObj + 1
     ObjectId   = dmAPIGet("get,c," & DmQuery & ",r_object_id") 
     ret      = dmAPIExec("fetch,c," & ObjectId)
     If ret = 0 then
        Call DmPrint("Could not fetch object with Id " & ObjectId & _
                 ".", "fatal")
     End If
     If unlockOnly = "unlock" Then
        Call UnlockObject(ObjectId)
     Else
         DmObjectType = dmAPIGet("get,c,l,r_object_type")
         lockOwner = dmAPIGet("get,c,l,r_lock_owner")
         If ReportOnly = false Then
            ' save latest modifier, check immutability ret
            theModifier = dmAPIGet("get,c,l,r_modifier")
            mutable = dmAPIGet("get,c,l,r_immutable_flag")
            If mutable = "1" Or mutable$ = "T" Then
                ret = dmAPISet("set,c,l,r_immutable_flag", "F")
            End If
         End If
            
         AttrList = ""
         DmOwnerName = dmAPIGet("get,c," & ObjectId & ",owner_name")
         If(DmOwnerName = OldUserName) then 
            AttrList = "owner_name"
         End If
            
         DmAclDomain = dmAPIGet("get,c," & ObjectId & ",acl_domain")
         If(DmAclDomain = OldUserName) then
            If(AttrList <> "") then
                AttrList = AttrList & ","
            End If
            AttrList = AttrList & "acl_domain"
         End If
            
         DmCreatorName = dmAPIGet("get,c," & ObjectId & ",r_creator_name")
         If(DmCreatorName = OldUserName) then
            If(AttrList <> "") then
                AttrList = AttrList & ","
            End If
            AttrList = AttrList & "r_creator_name"
         End If

         DmModifierName = dmAPIGet("get,c," & ObjectId & ",r_modifier")
         If(DmModifierName = OldUserName) then
            If(AttrList <> "") then 
                AttrList = AttrList & ","
            End If
            AttrList = AttrList & "r_modifier"
         End If

         DmLockOwner = dmAPIGet("get,c," & ObjectId & ",r_lock_owner")
         If(DmLockOwner = OldUserName) then
            If(AttrList <> "") then
                AttrList = AttrList & ","
            End If
            AttrList = AttrList & "r_lock_owner"
         End If

         If(AttrList <> "") then
            Call CaExecute(DmObjectType, ObjectId, "object_name", AttrList, _
                   "Could not save sysobject", "error", "dm_sysobject", "")
         End If

         If ReportOnly = false Then
            ' set immutability back, if necessary
            If mutable$ = "1" Or mutable$ = "T" Then
                ret = dmAPISet("set,c,l,r_immutable_flag", "T")
                ret = dmAPIExec("save,c,l")
            End If
            ' set modifier back to original, or newname if last modified by renamed user
            If theModifier$ = OldUserName Then theModifier$ = NewUserName
            ' Checking for ReportOnly above, don't need to call ExecuteUpdateSql
            ret = dmAPIExec("execsql,c,update dm_sysobject_s set r_modifier = '" & theModifier$ & _
                "' where r_object_id = '" & ObjectId$ & "'")
         End If
     End If
  Wend
  ret = dmAPIExec("close,c," & DmQuery)
     Call DmPrint("**** Number of sysobjects affected: " & NumOfObj, _
                  "information")
End Sub

Below is my extremely clear and straightforward implementation:


  Call DmUpdateUsrDefACL()
  
' Update sysobjects.

  Call DmUpdateSysobj() 

...

Sub DmUpdateSysobj()
  Dim query As String
  Dim ret As Integer 
  ' todo: add logic for MSSQL
  query = "update dm_sysobject_s " & _
          "set i_vstamp=i_vstamp+1, " & _
          "r_modifier=DECODE(r_modifier, '" & DqlOldUserName & "','" & DqlNewUserName & "', r_modifier), " & _
          "r_lock_owner=DECODE(r_lock_owner, '" & DqlOldUserName & "','" & DqlNewUserName & "', r_lock_owner), " & _
          "owner_name=DECODE(owner_name, '" & DqlOldUserName & "','" &  DqlNewUserName & "', owner_name), " & _
          "acl_domain=DECODE(acl_domain, '" & DqlOldUserName & "','" &  DqlNewUserName & "', acl_domain), " & _
          "r_creator_name=DECODE(r_creator_name, '" & DqlOldUserName & "','" & DqlNewUserName & "', r_creator_name) " & _
          "where (owner_name = '" & DqlOldUserName & "' or " & _
          "r_creator_name = '" & DqlOldUserName & "' or " & _
          "r_modifier = '" & DqlOldUserName & "' or " & _
          "acl_domain = '" & DqlOldUserName & "' or " & _
          "r_lock_owner = '" & DqlOldUserName & "')"
         
  Call DmPrint("Updating sysobjects using query: " & query, "information")
  ret = ExecuteUpdateSql(query, "Update sysobject attributes for user " & OldUserName)
  If ret <> 1 Then
    Call DmPrint("Could not update sysobject attributes for user " & OldUserName,"fatal")
    Exit Sub    
  End If        
End Sub

Enjoy 🙂

WebTop’s new content transfer mechanism

LOL 🙂

Alvaro de Andres' Blog

Yesterday I saw Webtop 6.8.2 was released, and as this was great news because this version was supposed to remove Java applets once and for all. While most of us though that removing applets would imply moving to HTML5, EMC has decided to implement an intermediate solution 😦

Webtop 6.8.2 will prompt users to install a browser extension first time you log in:

webtop-warning

In Firefox the extension will be installed, and in Chrome user will be redirected to the Chrome Web Store:

webtop-chrome-extension

After installing the extension, the first time users try to download something they’ll be prompted to install a local application that will handle the transfers (WebSockets I presume).

Good news: Now you don’t need to fight with Java/browser combination

Bad news: Users have to perform operations. Not the cleanest solution (IMHO).

Bonus: Now importing files is really not-user friendly:

  1. File -> import
  2. Ugly Java window will pop-up…

View original post 45 more words

Webtop 6.8.2 released

Actually, nothing has been changed – webtop still uses ucf for content transfer and requires JRE installed

Alvaro de Andres' Blog

With the announcements of the Dell-EMC merge and OpenText buying Documentum, it looks like this slipped through. EMC/DellEMC ECD/OpenText released the WebTop with the new content transfer mechanism.

Release notes: https://support.emc.com/docu78653_Documentum-Webtop-6.8.2-Release-Notes.pdf

View original post