Surprise: Documentum loses data

I’m not sure whether it is a common practice, but all Documentum projects I have participated in, have the following requirement: every document, created in repository, must have unique human-readable identifier, which business users may refer in their daily activities, extra requirements could be:

  • identifiers are sequential
  • there should no holes in sequence
  • sequence starts from the beginning on every new day/week/month/year

and, depending on the requirements, implementation could be either optimistic:

private int getAnnualNextNumber(ICounter counter)
        throws DfException {
    DfException resultException = null;
    for (int i = 0; i < 10; i++) {
        try {
            synchronized (CounterService.class) {
                int lastNumber = counter.getLastNumber() + 1;
                counter.setLastNumber(lastNumber);
                counter.save();
                return lastNumber;
            }
        } catch (DfException dfe) {
            resultException = dfe;
            // neither fetch nor revert is required because DFC
            // resets object's state in
            // com.documentum.fc.client.DfSysObject#doSave
            // counter.fetch(null);
        }
    }
    throw resultException;
}

or pessimistic:

private int getAnnualNextNumber(ICounter counter)
    throws DfException {
    synchronized (CounterService.class) {
        counter.lock();
        int lastNumber = counter.getLastNumber() + 1;
        counter.setLastNumber(lastNumber);
        counter.save();
        return lastNumber;
    }
}

Interesting thing here is in case of optimistic implementation we observe that our code generates duplicates, from audit perspective it looks like CS does not receive updated value from DFC:

Setup dfc.properties properly

Can’t stop catching myself thinking about JMS high availability misfeature, or more precisely: why it is supported by workflow methods only, and it seems that all CS instances are just misconfigured. Let’s explain.

Below is a brand new CS installation, which already has one dm_client_rights record:

Connected to Documentum Server running Release 7.3.0050.0039  Linux64.Postgres
Session id is s0
API> ?,c,select r_object_id, client_id from dm_client_rights
r_object_id       client_id                               
----------------  ----------------------------------------
0802994680000582  dfc_xxpa7jhtGFkRfHvrQmUIyaffxlka        
(1 row affected)

API> dump,c,0802994680000582
...
USER ATTRIBUTES

  object_name                     : dfc_docu73dev01_ffxlka
  title                           : 
  subject                         : 
  authors                       []: <none>
  keywords                      []: <none>
  resolution_label                : 
  owner_name                      : dmadmin
  owner_permit                    : 7
  group_name                      : docu
  group_permit                    : 1
  world_permit                    : 1
  log_entry                       : 
  acl_domain                      : dmadmin
  acl_name                        : dm_4502994680000222
  language_code                   : 
  client_id                       : dfc_xxpa7jhtGFkRfHvrQmUIyaffxlka
  public_key_identifier           : 77016FB9066276A0EF4801918F27F52C7176CD2F
  host_name                       : docu73dev01
  allowed_roles                 []: <none>
  allow_all_roles                 : T
  allow_all_priv_modules          : F
  principal_auth_priv             : T
  server_trust_priv               : T
  app_name                        : 
  is_globally_managed             : F

Where did this dm_client_rights record come from? According to the log file $DM_HOME/install/logs/install.log this dm_client_rights record was created by installer:

10:25:56,168  INFO [main] com.documentum.install.server.installanywhere.actions.DiWAServerCreateBofRegistryUser - Registering Client Roles.
10:25:56,198  INFO [main] com.documentum.fc.client.security.impl.JKSKeystoreUtilForDfc - keystore file name is /u01/documentum/cs/shared/config/dfc.keystore
10:25:56,382  INFO [main] com.documentum.fc.client.impl.connection.docbase.DocbaseConnection - Object protocol version 2
10:25:56,818  INFO [main] com.documentum.fc.client.security.impl.JKSKeystoreUtilForDfc - keystore file name is /u01/documentum/cs/shared/config/dfc.keystore
10:25:56,844  INFO [main] com.documentum.fc.client.security.impl.DfcIdentityPublisher - found client registration: false
10:25:57,148  INFO [main] com.documentum.fc.client.privilege.impl.PublicKeyCertificate - stored certificate for CN 
10:25:57,272  INFO [main] com.documentum.fc.client.security.impl.IpAndRcHelper - filling in DCTM_PSQL a new record with this persistent certificate:
-----BEGIN CERTIFICATE-----
MIIDHzCCAgcCEDM7pl2LftisOKZ3mYFjNigwDQYJKoZIhvcNAQELBQAwTjETMBEG
A1UECwwKRG9jdW1lbnR1bTEMMAoGA1UECgwDRU1DMSkwJwYDVQQDDCBkZmNfeHhw
YTdqaHRHRmtSZkh2clFtVUl5YWZmeGxrYTAeFw0xNzAzMTExMDEyMDVaFw0yNzAz
MDkxMDE3MDVaME4xEzARBgNVBAsMCkRvY3VtZW50dW0xDDAKBgNVBAoMA0VNQzEp
MCcGA1UEAwwgZGZjX3h4cGE3amh0R0ZrUmZIdnJRbVVJeWFmZnhsa2EwggEiMA0G
CSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQDPY3I1TzGodyI7Q7oHuLlA4N0IdLnp
oUEZqONy5Ev1f8RJLIJGowjSQme1HPrk6ZgdyBlGkLqguRdnn6hLrBIUiZSc7XRr
OM8xCYp6wEXUuxDTpX58Q32QLInjTjftHOblav201lOQStQUcrEAcUVZ+UK/Xt1t
Q9hQUpOvXWDRxBRPpN7VdTg1lCNuNC/BjYO/yBc2zUPYsarmnM1BcyeTi6RmlfME
PUsVPMqS8muBKP/o7ZUqXVMWNFNRHVbnOCX8KHZgO4DQLp7pcgYq0ak4vQ6BEfx8
fxo/egAS84JiemOxhslxytN5cBnFmc3NCdKKVzRcZE9fecC3DMt41dYfAgMBAAEw
DQYJKoZIhvcNAQELBQADggEBAARMI7W7ggw9GJZtfsqre5+idsfcxtuMPhI+bmU/
gAb3pyZug11z29AvCcprh545agcUQgm9wlgaJFUtiktd4n6hE12G46Vu/boqxy4J
iBs3kWQd2Qeh4Qobm8xvBu0VKSiHJRmbm5xslnq3yJorBZiNjvuoFVsaYtY24kiy
AxUB5y2vgUhZeLe+0WPrBEA3/I+ciGO/Jk6KXyL9vz8+04Hx6sBfkMsY1l8aa1HH
PtQdgfasysgVkIqCZ70zAXd5ARC4CXEwfhj6v/eq7X3CM4KCP4TiPqmzzapsLPn3
i0Or+fnwrOy/rYybndj0pgpnCbtinUZ7ZXmVtWDevMWey/Q=
-----END CERTIFICATE-----
10:25:57,280  INFO [main] com.documentum.fc.client.security.impl.DfcIdentityPublisher - found client registration: false
10:25:57,659  INFO [main] com.documentum.fc.client.security.impl.IpAndRcHelper - filling a new registration record for dfc_xxpa7jhtGFkRfHvrQmUIyaffxlka
10:25:57,672  INFO [main] com.documentum.fc.client.security.impl.DfcIdentityPublisher - [DFC_SECURITY_GR_REGISTRATION_PUBLISH] this dfc instance is now published in the global registry DCTM_PSQL
10:25:57,695  INFO [main] com.documentum.fc.client.security.impl.DfcRightsCreator - assigning rights to all roles for this client on DCTM_PSQL
10:25:57,701  INFO [main] com.documentum.fc.client.security.impl.DfcRightsCreator - found client rights: false
10:25:57,733  INFO [main] com.documentum.fc.client.security.impl.DfcIdentityPublisher - found client registration: true
10:25:57,746  INFO [main] com.documentum.fc.client.security.impl.DfcRightsCreator - found client rights: false
10:25:57,989  INFO [main] com.documentum.fc.client.security.impl.IpAndRcHelper - filling a new rights record for dfc_xxpa7jhtGFkRfHvrQmUIyaffxlka
10:25:58,015  INFO [main] com.documentum.fc.client.security.impl.DfcRightsCreator - [DFC_SECURITY_DOCBASE_RIGHTS_REGISTER] this dfc instance has now escalation rights registered with docbase DCTM_PSQL

How many dfc.keystore files do we have?

~]$ find /u01/documentum/cs/ -name dfc.keystore
/u01/documentum/cs/shared/config/dfc.keystore
.../ServerApps.ear/APP-INF/classes/dfc.keystore
.../com.emc.ide.external.dfc_1.0.0/documentum.config/dfc.keystore

How many dfc.properties files do we have?

~]$ find /u01/documentum/cs/ -name dfc.properties
/u01/documentum/cs/shared/config/dfc.properties
.../ServerApps.ear/APP-INF/classes/dfc.properties
.../com.emc.ide.external.dfc_1.0.0/documentum.config/dfc.properties

You might say the second one (JMS’s) is not actually dfc.properties because it looks like:

#include /u01/documentum/cs/shared/config/dfc.properties
dfc.bof.classloader.enable_extension_loader_first=false

but it is, moreover dfc.config.file read-only property defines the path to dfc.properties file, and dfc.config.dir read-only property defines the directory containing dfc.properties file. Now, the only option which default value depends on dfc.config.dir is:

# Fully qualified file name of the keystore file holding the PKI credentials for 
# DFC. 
# 
# Defaults to dfc.keystore in the same directory where the property file 
# (dfc.properies) is found.                                                     
# 
dfc.security.keystore.file = ${dfc.config.dir}/dfc.keystore

I think it is obvious that $DOCUMENTUM_SHARED/config/dfc.properties is misconfigured because it lacks dfc.security.keystore.file entry.

Never happened before and happened again

As I described previosely, in 2012 we were implementing a large Documentum project, and that time we were observing a lot of performance and stability issues (actually, it was bit weird to observe such issues in mature product maintained by talented team). And one of issues was following: it was not possible to restart JMS during business hours (from my perspective it is not a big deal to do that if your application is properly designed – business users do not consume JMS, so, bouncing JMS should not affect business users):

  • about 5 of 10 restarts ended with deadlocks in java
  • about 4 of 10 restarts ended with crash of java process with SIGBUS error in libzip.so, something like:
    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  SIGBUS (0x7) at pc=0x00007f881a262982, pid=19993, tid=140222193350400
    #
    # JRE version: Java(TM) SE Runtime Environment (7.0_72-b14) (build 1.7.0_72-b14)
    # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.72-b04 mixed mode linux-amd64 compressed oops)
    # Problematic frame:
    # C  [libzip.so+0x4982]  newEntry+0x62
    #
    
  • and only 1 of 10 restarts was successful

So, we filed a couple SRs and after a while EMC support started claiming that first two issues were addressed in recent Documentum patches. Frankly speaking, during last 4 years I haven’t seen JMS crashes with SIGBUS error in libzip.so, but now I’m not sure that talented team had really fixed the issue, because two days ago our DevOps engineer claimed that Documentum 7.2 is still affected and he was a witness of three succeeded crashes of JMS with SIGBUS error in libzip.so – I just have lost faith in talented team 😦

Well, let discuss this odd error and try to fix it 🙂 If you google for java + sigbus + libzip.so you will find a plenty of bugs in openjdk JIRA and all of them states something like:

  • JVM maps central directory of zip file into memory using mmap system call due to performance considerations
  • it is expected that application receives SIGBUS error if mapped file gets overwritten:
           SIGBUS Attempted access to a portion of the buffer that does not
                  correspond to the file (for example, beyond the end of the
                  file, including the case where another process has truncated
                  the file).
    
  • there is a -Dsun.zip.disableMemoryMapping=true system property which was designed to “prevent crashes”

So, JDK seems to “work as designed”, nobody is going to fix this behaviour (not true for JDK9, but for current JDK versions we must take this behaivour as given) and the real root cause of our problem is “somebody” changes jar files on JMS while they are in use and, hence, causes it crashes, and this “somebody” is an odd TBO implementation. Is it possible to fix it? Actually, I have no idea what talented team was doing four years ago, but I have found robust solution within four hours (if my memory serves me right they was trying to fix the issue during four months) – the good starting point here is to find/write a good testcase and I have found it in JDK-8017777 SIGBUS in java.util.zip.ZipFile.getEntry:

$ java ZipFileTest
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0xa) at pc=0x00000001054cfe92, pid=5775, tid=0x0000000000004d03
#
# JRE version: Java(TM) SE Runtime Environment (8.0_111-b14) (build 1.8.0_111-b14)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.111-b14 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# C  [libzip.dylib+0x2e92]  newEntry+0x154
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /Users/apanfilov/hs_err_pid5775.log
[thread 21251 also had an error]
[thread 22275 also had an error]
[thread 23299 also had an error]
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
[thread 24323 also had an error]
Abort trap: 6

Cool! Not lets check the behaviour of JVM with -Dsun.zip.disableMemoryMapping=true system property:

$ java -Dsun.zip.disableMemoryMapping=true ZipFileTest
Exception in thread "Thread-2" java.util.zip.ZipError: jzentry == 0,
 jzfile = 140315557132880,
 total = 1001,
 name = /var/folders/9d/m0q0kvvx3fj54mr_m3qs3twh0000gn/T/zipfile.zip ,
 i = 1,
 message = null
	at java.util.zip.ZipFile$ZipEntryIterator.next(ZipFile.java:519)
	at java.util.zip.ZipFile$ZipEntryIterator.nextElement(ZipFile.java:499)
	at java.util.zip.ZipFile$ZipEntryIterator.nextElement(ZipFile.java:480)
	at ZipFileTest$ZipFileReader.run(ZipFileTest.java:65)
	at java.lang.Thread.run(Thread.java:745)
Exception in thread "Thread-0" java.util.zip.ZipError: jzentry == 0,
 jzfile = 140315557132880,
 total = 1001,
 name = /var/folders/9d/m0q0kvvx3fj54mr_m3qs3twh0000gn/T/zipfile.zip ,
 i = 1,
 message = null
	at java.util.zip.ZipFile$ZipEntryIterator.next(ZipFile.java:519)
	at java.util.zip.ZipFile$ZipEntryIterator.nextElement(ZipFile.java:499)
	at java.util.zip.ZipFile$ZipEntryIterator.nextElement(ZipFile.java:480)
	at ZipFileTest$ZipFileReader.run(ZipFileTest.java:65)
	at java.lang.Thread.run(Thread.java:745)

Well, JVM didn’t crash, but we got a plenty of errors, so, -Dsun.zip.disableMemoryMapping=true system property prevents crashes only and seems to be unusable 😦 Here we need to stop blaming JDK and start thinking about following: what does mean “mapped file gets overwritten”, i.e. are the following scenarios are doing the same or not:

$ file 1.txt
1.txt: empty
$ echo "talented team" > 1.txt
$ file 1.txt
1.txt: ASCII text
$ echo "talented team" > 1.txt.tmp
$ mv 1.txt.tmp 1.txt

Actually, the second scenario is safe from JVM perspective and small patch for initial testcase may prove this:

@@ -79,9 +78,12 @@ public ZipFileWriter(File aFile) {
         public void run() {
             int i = 1;
             ZipOutputStream zOut;
+            File tmp;
             try {
-                zOut = new ZipOutputStream(new FileOutputStream(file));
-            } catch (FileNotFoundException e1) {
+                tmp = File.createTempFile(ZipFileTest.class.getSimpleName(),
+                        null, file.getParentFile());
+                zOut = new ZipOutputStream(new FileOutputStream(tmp));
+            } catch (IOException e1) {
                 e1.printStackTrace();
                 return;
             }
@@ -102,6 +104,9 @@ public void run() {
             } finally {
                 try {
                     zOut.close();
+                    if(!tmp.renameTo(file)) {
+                        System.err.println("Can't rename");
+                    }
                 } catch (IOException e) {
                 }
             }

and here we are:

$ java ZipFileTest
$ 

What about DFC? Javassist rocks!

BugMakers :)

Frankly speaking, when I was writing previous blogpost I got surprised to discover that DQL update statement preserves the value of r_lock_owner attribute:

API> revert,c,09024be98006b104
...
OK
API> get,c,09024be98006b104,r_lock_owner
...
dmadmin
API> ?,c,update dm_document objects set object_name='xxx' where r_object_id='09024be98006b104'
objects_updated
---------------
              1
(1 row affected)
[DM_QUERY_I_NUM_UPDATE]info:  "1 objects were affected by your UPDATE statement."


API> revert,c,09024be98006b104
...
OK
API> get,c,09024be98006b104,r_lock_owner
...
dmadmin
API> 

Unfortunately, it is not true when you update objects, which behaviour is customized via TBO:

API> retrieve,c,bs_doc_cash
...
09bc2c71806d6ffe
API> checkout,c,l
...
09bc2c71806d6ffe
API> ?,c,update bs_doc_cash objects set object_name='xxx' where r_object_id='09bc2c71806d6ffe'
objects_updated
---------------
              1
(1 row affected)
[DM_QUERY_I_NUM_UPDATE]info:  "1 objects were affected by your UPDATE statement."


API> revert,c,09bc2c71806d6ffe
...
OK
API> get,c,09bc2c71806d6ffe,r_lock_owner
...

API>

but:

API> checkout,c,09bc2c71806d6ffe
...
09bc2c71806d6ffe
API> apply,c,,EXEC,QUERY,S,update bs_doc_cash objects set object_name='xxx' where r_object_id='09bc2c71806d6ffe',BOF_DQL,B,F
...
q0
API> next,c,q0
...
OK
API> dump,c,q0
...
USER ATTRIBUTES

  objects_updated                 : 1

SYSTEM ATTRIBUTES


APPLICATION ATTRIBUTES


INTERNAL ATTRIBUTES


API> revert,c,09bc2c71806d6ffe
...
OK
API> get,c,09bc2c71806d6ffe,r_lock_owner
...
dmadmin
API> 

Explanation for dfc.diagnostics.resources.enable

It is bit weird, but everybody who previously asked me about session leaks didn’t properly understand the root cause of the problem – the typical question was: we started getting a lot of DM_API_E_NO_SESSION errors, we increased value of concurrent_sessions parameter in server.ini, but this didn’t help. Well, I have written this blogpost in order to shed a light on session leaks in DFC. First of all, you may face with two following errors:

DfServiceException:: THREAD: main; MSG: [DM_SESSION_E_MAX_SESSIONS_EXCEEDED]
  error:  "Maximum number of server sessions exceeded"; ERRORCODE: 100; NEXT: null
	at com.documentum.fc.client.impl.docbase.DocbaseExceptionMapper.newException(DocbaseExceptionMapper.java:44)
	at com.documentum.fc.client.impl.docbase.DocbaseExceptionMapper.newException(DocbaseExceptionMapper.java:34)
	at com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient.newSessionByAddr(NetwiseDocbaseRpcClient.java:118)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.beginSession(DocbaseConnection.java:299)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.open(DocbaseConnection.java:130)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.<init>(DocbaseConnection.java:100)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.<init>(DocbaseConnection.java:60)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnectionFactory.newDocbaseConnection(DocbaseConnectionFactory.java:26)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnectionManager.createNewConnection(DocbaseConnectionManager.java:180)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnectionManager.getDocbaseConnection(DocbaseConnectionManager.java:110)
	at com.documentum.fc.client.impl.session.SessionFactory.newSession(SessionFactory.java:23)
	at com.documentum.fc.client.impl.session.PrincipalAwareSessionFactory.newSession(PrincipalAwareSessionFactory.java:44)
	at com.documentum.fc.client.impl.session.PooledSessionFactory.newSession(PooledSessionFactory.java:49)
	at com.documentum.fc.client.impl.session.SessionManager.getSessionFromFactory(SessionManager.java:134)
	at com.documentum.fc.client.impl.session.SessionManager.newSession(SessionManager.java:72)
	at com.documentum.fc.client.impl.session.SessionManager.getSession(SessionManager.java:191)

and

DfServiceException:: THREAD: main; MSG: [DM_API_E_NO_SESSION]
  error:  "There are no more available sessions."; ERRORCODE: 100; NEXT: null
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnectionManager.getConnectionFromPool(DocbaseConnectionManager.java:168)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnectionManager.getDocbaseConnection(DocbaseConnectionManager.java:94)
	at com.documentum.fc.client.impl.session.SessionFactory.newSession(SessionFactory.java:23)
	at com.documentum.fc.client.impl.session.PrincipalAwareSessionFactory.newSession(PrincipalAwareSessionFactory.java:44)
	at com.documentum.fc.client.impl.session.PooledSessionFactory.newSession(PooledSessionFactory.java:49)
	at com.documentum.fc.client.impl.session.SessionManager.getSessionFromFactory(SessionManager.java:134)
	at com.documentum.fc.client.impl.session.SessionManager.newSession(SessionManager.java:72)
	at com.documentum.fc.client.impl.session.SessionManager.getSession(SessionManager.java:191)

The first one (DM_SESSION_E_MAX_SESSIONS_EXCEEDED) means that you are running out of Content Server sessions (concurrent_sessions parameter in server.ini), the second one (DM_API_E_NO_SESSION) means that you are running out of DFC sessions (dfc.session.max_count parameter in dfc.propetries), the problem is both errors may relate to either incorrect sizing or programming mistake. So, before trying to investigate memory leaks you need to eliminate sizing issue – it is might be bit challenging because dumb DFC does not provide any runtime statistics about sessions (recent DFC versions support dfc.connection.profile_connections and dfc.connection.trace_connections_only flags in dfc.properties, these flags cause com.documentum.fc.client.impl.connection.docbase.DocbaseConnectionManager class to write extra debug information into log files, but I would prefer something more reliable, JMX for example), so, let’s pretend that sizing issue is not our case. What is the next?

Official documentation (DFC Development Guide) states following:

and there are at least two mistakes:

  1. it also logs information about leaks of stream objects, which were created through ISysObjectInternal#getStream method, corresponding log message looks like:
    [Resource Housekeeper] ERROR com.documentum.fc.impl.util.io.InputStreamHandle$DisposableData  - [DFC_STREAM_NOT_CLOSED] Unclosed stream found during garbage collection, com.documentum.fc.client.content.impl.PullerInputStream@28aac04e.
    com.documentum.fc.impl.util.ThrowableStack: Stack when stream was created
    	at com.documentum.fc.client.content.impl.ContentInputStream.<init>(ContentInputStream.java:17)
    	at com.documentum.fc.client.content.impl.PusherPullerContentAccessor.buildStreamFromContext(PusherPullerContentAccessor.java:46)
    	at com.documentum.fc.client.content.impl.PusherPullerContentAccessor.getStream(PusherPullerContentAccessor.java:28)
    	at com.documentum.fc.client.content.impl.ContentAccessorFactory.getStream(ContentAccessorFactory.java:37)
    	at com.documentum.fc.client.content.impl.Store.getStream(Store.java:63)
    	at com.documentum.fc.client.content.impl.FileStore___PROXY.getStream(FileStore___PROXY.java)
    	at com.documentum.fc.client.content.impl.Content.getStream(Content.java:185)
    	at com.documentum.fc.client.content.impl.Content___PROXY.getStream(Content___PROXY.java)
    	at com.documentum.fc.client.content.impl.ContentManager.getStream(ContentManager.java:84)
    	at com.documentum.fc.client.content.impl.ContentManager.getStream(ContentManager.java:53)
    	at com.documentum.fc.client.DfSysObject.getStream(DfSysObject.java:1994)
  2. the last statement is incorrect, correct is: if DFC wasn’t buggy it would place an appropriate message in the log

So, what does happen when we enable dfc.diagnostics.resources.enable? The short answer is: if your code suffers from leaks of transaction sessions DFC replaces them by a memory leak 🙂

Now the long answer. DFC manages following types of sessions:

  • com.documentum.fc.client.impl.session.Session
  • com.documentum.fc.client.impl.session.WeakSessionHandle
  • com.documentum.fc.client.impl.session.StrongSessionHandle
  • com.documentum.fc.client.impl.session.DynamicSessionHandle

the first one is the real/original session, last three are decorators. We typically deal with StrongSessionHandle (IDfSessionManager#getSession) and DynamicSessionHandle (IDfTypedObject#getSession):

@Test
public void test() throws Exception {
    IDfClientX clientX = new DfClientX();
    IDfClient client = clientX.getLocalClient();
    IDfSessionManager sessionManager = client.newSessionManager();
    IDfLoginInfo loginInfo = clientX.getLoginInfo();
    loginInfo.setUser("dmadmin");
    loginInfo.setPassword("dmadmin");
    sessionManager.setIdentity("DCTM_DEV", loginInfo);
    IDfSession session = sessionManager.getSession("DCTM_DEV");
    System.out.println("session managers return "
            + session.getClass().getName());
    IDfSysObject object = (IDfSysObject) session
            .getObjectByQualification("dm_document");
    System.out.println("typed objects store "
            + object.getSession().getClass().getName());
    sessionManager.release(session);
    sessionManager.flushSessions();
    System.out.println("we may use dynamic session handles "
            + "even if original session was released");
    object.getString("object_name");
    object.save();
    System.out.println("we may not use strong session handles "
            + "if it was released");
    object = (IDfSysObject) session.getObjectByQualification("dm_document");
}

session managers return com.documentum.fc.client.impl.session.StrongSessionHandle
typed objects store com.documentum.fc.client.impl.session.DynamicSessionHandle
we may use dynamic session handles even if original session was released
we may not use strong session handles if it was released

com.documentum.fc.common.DfRuntimeException: Using session handle that has already been released

	at com.documentum.fc.client.impl.session.StrongSessionHandle.referenceSession(StrongSessionHandle.java:64)
	at com.documentum.fc.client.impl.session.SessionHandle.referenceScopableSession(SessionHandle.java:74)
	at com.documentum.fc.client.impl.session.SessionHandle.getObjectByQualification(SessionHandle.java:752)

And when we enable dfc.diagnostics.resources.enable DFC starts keeping track of StrongSessionHandle instances, i.e. if instance of StrongSessionHandle class wasn’t properly released DFC logs corresponding error. How would you implement such functionality? Actually the implementation is pretty straightforward – we need to know circumstances under which our object was constructed, in order to do that we save construction stack into object and log that stack in finalize() method, something like:

public class FinalizationDemo {

    private final boolean _diagnosticsEnabled = true;

    @Test
    public void test() throws Exception {
        try {
            while (true) {
                IResource resource = getResource();
            }
        } catch (OutOfMemoryError error) {
            error.printStackTrace();
            System.exit(1);
        }
    }

    private IResource getResource() {
        return new Resource();
    }

    interface IResource extends Closeable {

        boolean isClosed();

    }

    class Resource implements IResource {

        private boolean _closed;

        private Throwable _construction;

        private final byte[] data = new byte[1000 * 100];

        public Resource() {
            if (_diagnosticsEnabled) {
                _construction = new Exception();
            }
        }

        @Override
        public void close() throws IOException {
            System.out.println("Closed");
            _closed = true;
            _construction = null;
        }

        @Override
        public boolean isClosed() {
            return _closed;
        }

        @Override
        protected void finalize() throws Throwable {
            if (_construction != null && !isClosed()) {
                _construction.printStackTrace();
            }
        }
    }

}

ECD’s implementation is different – instead of using finalizers they use weak references, moreover before logging error message they try to cleanup resources (such behaviour suffers from side effects described in previous post) and their code looks like:

public class DisposableDemo {

    private final ReferenceQueue<WeakReference> _queue = new ReferenceQueue<>();
    private final Map<Reference, IDisposable> _data = Collections
            .synchronizedMap(new IdentityHashMap<Reference, IDisposable>());

    private final boolean _diagnosticsEnabled = true;

    public DisposableDemo() {
        Thread thread = new Thread(new DisposingTask());
        thread.setPriority(Thread.MAX_PRIORITY);
        thread.start();
    }

    class DisposingTask implements Runnable {
        @Override
        public void run() {
            try {
                Reference reference;
                while ((reference = _queue.remove()) != null) {
                    IDisposable disposable = _data.remove(reference);
                    if (disposable == null) {
                        continue;
                    }
                    try {
                        disposable.dispose();
                    } catch (IOException ex) {
                        ex.printStackTrace();
                    }
                }
            } catch (InterruptedException ex) {
                Thread.currentThread().interrupt();
            }
        }
    }

    private void register(Object resource, IDisposable disposable) {
        WeakReference reference = new WeakReference(resource, _queue);
        _data.put(reference, disposable);
    }

    @Test
    public void test() throws Exception {
        try {
            while (true) {
                IResource resource = getResource();
                // System.gc();
            }
        } catch (OutOfMemoryError error) {
            error.printStackTrace();
            System.exit(1);
        }
    }

    private IResource getResource() {
        IResource resource = new Resource();
        ResourceDecorator decorator = new ResourceDecorator(resource);
        if (_diagnosticsEnabled) {
            register(decorator, decorator._disposable);
        }
        return decorator;
    }

    interface IDisposable {

        void dispose() throws IOException;

    }

    interface IResource extends Closeable {

        boolean isClosed();

    }

    class Resource implements IResource {

        private boolean _closed;

        private final byte[] data = new byte[1000 * 100];

        @Override
        public void close() throws IOException {
            System.out.println("Closed");
            _closed = true;
        }

        @Override
        public boolean isClosed() {
            return _closed;
        }

    }

    class ResourceDecorator implements IResource {

        private final IResource _resource;

        private final IDisposable _disposable;

        public ResourceDecorator(IResource resource) {
            _resource = resource;
            if (_diagnosticsEnabled) {
                _disposable = new Disposable(_resource);
            } else {
                _disposable = null;
            }
        }

        @Override
        public void close() throws IOException {
            _resource.close();
        }

        @Override
        public boolean isClosed() {
            return _resource.isClosed();
        }
    }

    // !!! must not refer to ResourceDecorator
    static class Disposable implements IDisposable {

        private IResource _resource;

        private final Throwable _construction;

        public Disposable(IResource resource) {
            _resource = resource;
            _construction = new Exception();
        }

        @Override
        public void dispose() throws IOException {
            if (!_resource.isClosed()) {
                // closing resource
                _resource.close();
                // log construction stacktrace
                _construction.printStackTrace();
            }
        }

    }

}

Side effects of dfc.diagnostics.resources.enable

Assuming that I set correct credentials and we do know how to set dfc preferences in runtime, what does code below print?

@Test
public void test() throws Exception {
    int count = 0;
    try {
        DfPreferences preferences = DfPreferences.getInstance();
        preferences.loadProperty(DfPreferences.DFC_SESSION_MAX_COUNT, 50);
        while (++count > 0) {
            IDfClientX clientX = new DfClientX();
            IDfClient client = clientX.getLocalClient();
            IDfSessionManager sessionManager = client.newSessionManager();
            IDfLoginInfo loginInfo = clientX.getLoginInfo();
            loginInfo.setUser("dmadmin");
            loginInfo.setPassword("dmadmin");
            sessionManager.setIdentity("DCTM_DEV", loginInfo);
            IDfSession session = sessionManager.getSession("DCTM_DEV");
        }
    } finally {
        System.out.println("Session count: " + count);
    }
}

Actually, the answer is obvious – that buggy code creates 50 sessions and fails:

Session count: 50

DfServiceException:: THREAD: main; MSG: [DM_API_E_NO_SESSION]error:  "There are no more available sessions."
; ERRORCODE: 100; NEXT: null
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnectionManager.getConnectionFromPool(DocbaseConnectionManager.java:168)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnectionManager.getDocbaseConnection(DocbaseConnectionManager.java:94)
	at com.documentum.fc.client.impl.session.SessionFactory.newSession(SessionFactory.java:23)
	at com.documentum.fc.client.impl.session.PrincipalAwareSessionFactory.newSession(PrincipalAwareSessionFactory.java:44)
	at com.documentum.fc.client.impl.session.PooledSessionFactory.newSession(PooledSessionFactory.java:49)
	at com.documentum.fc.client.impl.session.SessionManager.getSessionFromFactory(SessionManager.java:134)
	at com.documentum.fc.client.impl.session.SessionManager.newSession(SessionManager.java:72)
	at com.documentum.fc.client.impl.session.SessionManager.getSession(SessionManager.java:191)
	at SessionDemo.test(SessionDemo.java:30)

Now, less obvious code:

@Test
public void test() throws Exception {
    int count = 0;
    try {
        DfPreferences preferences = DfPreferences.getInstance();
        preferences.loadProperty(DfPreferences.DFC_SESSION_MAX_COUNT, 50);
        preferences.loadProperty(DfPreferences.DFC_DIAGNOSTICS_RESOURCES_ENABLE, true);
        while (++count > 0) {
            IDfClientX clientX = new DfClientX();
            IDfClient client = clientX.getLocalClient();
            IDfSessionManager sessionManager = client.newSessionManager();
            IDfLoginInfo loginInfo = clientX.getLoginInfo();
            loginInfo.setUser("dmadmin");
            loginInfo.setPassword("dmadmin");
            sessionManager.setIdentity("DCTM_DEV", loginInfo);
            IDfSession session = sessionManager.getSession("DCTM_DEV");
        }
    } finally {
        System.out.println("Session count: " + count);
    }
}

the result is bit confusing – code creates more than 50 sessions and fails:

...
25586 [Resource Housekeeper] ERROR com.documentum.fc.client.impl.session.StrongSessionHandle$DisposableData  - [DFC_SESSION_NOT_RELEASED] Unreleased session found during garbage collection, Session{id=29, iid=29, serial=1, docbase=DCTM_DEV, user=dmadmin, serversession=none}
com.documentum.fc.impl.util.ThrowableStack: Stack when session was obtained
	at SessionDemo.test(SessionDemo.java:31)
...

Session count: 94

DfServiceException:: THREAD: main; MSG: [DM_API_E_NO_SESSION]error:  "There are no more available sessions."
; ERRORCODE: 100; NEXT: null
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnectionManager.getConnectionFromPool(DocbaseConnectionManager.java:168)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnectionManager.getDocbaseConnection(DocbaseConnectionManager.java:94)
	at com.documentum.fc.client.impl.session.SessionFactory.newSession(SessionFactory.java:23)
	at com.documentum.fc.client.impl.session.PrincipalAwareSessionFactory.newSession(PrincipalAwareSessionFactory.java:44)
	at com.documentum.fc.client.impl.session.PooledSessionFactory.newSession(PooledSessionFactory.java:49)
	at com.documentum.fc.client.impl.session.SessionManager.getSessionFromFactory(SessionManager.java:134)
	at com.documentum.fc.client.impl.session.SessionManager.newSession(SessionManager.java:72)
	at com.documentum.fc.client.impl.session.SessionManager.getSession(SessionManager.java:191)
	at SessionDemo.test(SessionDemo.java:31)

another code:

@Test
public void test() throws Exception {
    int count = 0;
    try {
        DfPreferences preferences = DfPreferences.getInstance();
        preferences.loadProperty(DfPreferences.DFC_SESSION_MAX_COUNT, 50);
        preferences.loadProperty(DfPreferences.DFC_DIAGNOSTICS_RESOURCES_ENABLE, true);
        while (++count > 0) {
            IDfClientX clientX = new DfClientX();
            IDfClient client = clientX.getLocalClient();
            IDfSessionManager sessionManager = client.newSessionManager();
            IDfLoginInfo loginInfo = clientX.getLoginInfo();
            loginInfo.setUser("dmadmin");
            loginInfo.setPassword("dmadmin");
            sessionManager.setIdentity("DCTM_DEV", loginInfo);
            IDfSession session = sessionManager.getSession("DCTM_DEV");
            System.gc();
        }
    } finally {
        System.out.println("Session count: " + count);
    }
}

the result is weird – code prints a lot of diagnostics messages but does not fail:

...
115902 [Resource Housekeeper] ERROR com.documentum.fc.client.impl.session.StrongSessionHandle$DisposableData  - [DFC_SESSION_NOT_RELEASED] Unreleased session found during garbage collection, Session{id=1, iid=1207, serial=1, docbase=DCTM_DEV, user=dmadmin, serversession=none}
com.documentum.fc.impl.util.ThrowableStack: Stack when session was obtained
	at SessionDemo.test(SessionDemo.java:31)
...

and the last example:

@Test
public void test() throws Exception {
    int count = 0;
    try {
        DfPreferences preferences = DfPreferences.getInstance();
        preferences.loadProperty(DfPreferences.DFC_SESSION_MAX_COUNT, 50);
        preferences.loadProperty(DfPreferences.DFC_DIAGNOSTICS_RESOURCES_ENABLE, true);
        while (++count > 0) {
            IDfClientX clientX = new DfClientX();
            IDfClient client = clientX.getLocalClient();
            IDfSessionManager sessionManager = client.newSessionManager();
            IDfLoginInfo loginInfo = clientX.getLoginInfo();
            loginInfo.setUser("dmadmin");
            loginInfo.setPassword("dmadmin");
            sessionManager.setIdentity("DCTM_DEV", loginInfo);
            IDfSession session = sessionManager.getSession("DCTM_DEV");
            session.beginTrans();
            System.gc();
        }
    } finally {
        System.out.println("Session count: " + count);
    }
}

result is even more weird – code does not print diagnostic messages and fails:

4964 [Resource Housekeeper] ERROR com.documentum.fc.client.impl.connection.docbase.DocbaseConnection  - Connection put back in pool with transaction active, Connection{id=1, docbase=DCTM_DEV, user=dmadmin, state=FREE}

Session count: 50

DfServiceException:: THREAD: main; MSG: [DM_API_E_NO_SESSION]error:  "There are no more available sessions."
; ERRORCODE: 100; NEXT: null
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnectionManager.getConnectionFromPool(DocbaseConnectionManager.java:168)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnectionManager.getDocbaseConnection(DocbaseConnectionManager.java:94)
	at com.documentum.fc.client.impl.session.SessionFactory.newSession(SessionFactory.java:23)
	at com.documentum.fc.client.impl.session.PrincipalAwareSessionFactory.newSession(PrincipalAwareSessionFactory.java:44)
	at com.documentum.fc.client.impl.session.PooledSessionFactory.newSession(PooledSessionFactory.java:49)
	at com.documentum.fc.client.impl.session.SessionManager.getSessionFromFactory(SessionManager.java:134)
	at com.documentum.fc.client.impl.session.SessionManager.newSession(SessionManager.java:72)
	at com.documentum.fc.client.impl.session.SessionManager.getSession(SessionManager.java:191)
	at SessionDemo.test(SessionDemo.java:31)

Corresponding thread dump:

"Resource Housekeeper" #13 daemon prio=10 os_prio=31 tid=0x00007fb78c1bd800 nid=0x5703 in Object.wait() [0x0000700001452000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00000006c0dc9828> (a com.documentum.fc.client.impl.connection.docbase.DocbaseConnection)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.waitForCorrectSharingContext(DocbaseConnection.java:820)
	- locked <0x00000006c0dc9828> (a com.documentum.fc.client.impl.connection.docbase.DocbaseConnection)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.evaluateRpc(DocbaseConnection.java:1108)
	- locked <0x00000006c0dc9828> (a com.documentum.fc.client.impl.connection.docbase.DocbaseConnection)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.applyForBool(DocbaseConnection.java:1198)
	- locked <0x00000006c0dc9828> (a com.documentum.fc.client.impl.connection.docbase.DocbaseConnection)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.apply(DocbaseConnection.java:1183)
	at com.documentum.fc.client.impl.docbase.DocbaseApi.endTrans(DocbaseApi.java:1757)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.cleanupFromUnexpectedTransactionState(DocbaseConnection.java:729)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.quiesce(DocbaseConnection.java:663)
	- locked <0x00000006c0dc9828> (a com.documentum.fc.client.impl.connection.docbase.DocbaseConnection)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnectionManager.returnConnectionToPool(DocbaseConnectionManager.java:663)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnectionManager.release(DocbaseConnectionManager.java:270)
	at com.documentum.fc.client.impl.session.Session.releaseConnection(Session.java:248)
	at com.documentum.fc.client.impl.session.PooledSessionFactory$SessionListener.onDisconnect(PooledSessionFactory.java:114)
	at com.documentum.fc.client.impl.session.Session.notifyListenersOfDisconnect(Session.java:395)
	at com.documentum.fc.client.impl.session.Session.disconnect(Session.java:312)
	at com.documentum.fc.client.impl.session.Session.decrementReferenceCount(Session.java:3687)
	at com.documentum.fc.client.impl.session.StrongSessionHandle$DisposableData.dispose(StrongSessionHandle.java:127)
	at com.documentum.fc.internal.reshousekeeper.ResourceHousekeeper$1$1.run(ResourceHousekeeper.java:52)

Try to explain results 🙂

Copying version tree. Part II

Wow, Marian Caikovski have found a bug in my old blogpost about copying version tree 🙂 Though it is not clear what “The described method will work even if the root version in the source object has been previously deleted. But the copies will not be entirely valid.” does mean, because I observe the following behaviour:

Connected to Documentum Server running Release 7.3.0000.0214  Linux64.Postgres
Session id is s0
API> create,c,dm_document
...
0902987880002cf1
API> save,c,l
...
OK
API> checkout,c,l
...
0902987880002cf1
API> checkin,c,l
...
0902987880002cfc
API> fetch,c,0902987880002cfc
...
OK
API> checkout,c,l
...
0902987880002cfc
API> checkin,c,l
...
0902987880002cfe
API> destroy,c,0902987880002cf1
...
OK
API> fetch,c,0902987880002cf1
...
OK
API> saveasnew,c,l
...
[DM_SYSOBJECT_E_IS_DELETED]error:  
  "The sysobject '' has been deleted and can not be updated/deleted."

which could be fixed using following trick:

API> set,c,l,i_is_deleted
SET> F   
...
OK
API> saveasnew,c,l
...
0902987880002cf3

The complete code is available on github.