Weird DQL grammar. Part III

Today I noticed another odd behaviour:

API> ?,c,select count(*) from dm_sysobject 
  where a_is_template=FALSE enable(GENERATE_SQL_ONLY)
generate_sql
-------------------------------------------------
select all count(*) from dm_sysobject_sp  dm_sysobject 
  where ((dm_sysobject.a_is_template=0)) 
  and (dm_sysobject.i_has_folder = 1 and dm_sysobject.i_is_deleted = 0) 
(1 row affected)

API> ?,c,select count(*) from dm_sysobject 
  where a_is_template=0 enable(GENERATE_SQL_ONLY)
generate_sql
-------------------------------------------------
select all count(*) from dm_sysobject_sp  dm_sysobject 
  where ((dm_sysobject.a_is_template=0)) 
  and (dm_sysobject.i_has_folder = 1 and dm_sysobject.i_is_deleted = 0) 
(1 row affected)

BUT:

API> ?,c,select count(*) from dm_dbo.dm_sysobject_s 
  where a_is_template=FALSE enable(GENERATE_SQL_ONLY)
generate_sql
-------------------------------------------------
select all count(*) from docbase.dm_sysobject_s 
  where ((a_is_template=0 OR a_is_template IS NULL)) 
(1 row affected)

API> ?,c,select count(*) from dm_dbo.dm_sysobject_s 
  where a_is_template=0 enable(GENERATE_SQL_ONLY)
generate_sql
-------------------------------------------------
select all count(*) from docbase.dm_sysobject_s 
  where ((a_is_template=0 OR a_is_template IS NULL)) 
(1 row affected)

i.e. Content Server thinks that it is smart enough and in case of registered tables generates wrong SQL, there are two options to “override” such behaviour:

  • write odd DQL, like:
    API> ?,c,select count(*) from dm_dbo.dm_sysobject_s 
      where (a_is_template=FALSE and a_is_template IS NOT NULL) enable(GENERATE_SQL_ONLY)
    generate_sql                                                                                                                             
    -------------------------------------------------
    select all count(*) from bee_ssc_dev.dm_sysobject_s 
      where (((a_is_template=0 OR a_is_template IS NULL) and a_is_template is not null)) 
    (1 row affected)
    
  • write tricky DQL:
    API> ?,c,select count(*) from dm_dbo.dm_sysobject_s 
      where a_is_template IN (FALSE) enable(GENERATE_SQL_ONLY)
    generate_sql                                                                     
    -------------------------------------------------
    select all count(*) from bee_ssc_dev.dm_sysobject_s 
      where (a_is_template in (0)) 
    (1 row affected)
    

Never happened before and happened again

As I described previosely, in 2012 we were implementing a large Documentum project, and that time we were observing a lot of performance and stability issues (actually, it was bit weird to observe such issues in mature product maintained by talented team). And one of issues was following: it was not possible to restart JMS during business hours (from my perspective it is not a big deal to do that if your application is properly designed – business users do not consume JMS, so, bouncing JMS should not affect business users):

  • about 5 of 10 restarts ended with deadlocks in java
  • about 4 of 10 restarts ended with crash of java process with SIGBUS error in libzip.so, something like:
    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  SIGBUS (0x7) at pc=0x00007f881a262982, pid=19993, tid=140222193350400
    #
    # JRE version: Java(TM) SE Runtime Environment (7.0_72-b14) (build 1.7.0_72-b14)
    # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.72-b04 mixed mode linux-amd64 compressed oops)
    # Problematic frame:
    # C  [libzip.so+0x4982]  newEntry+0x62
    #
    
  • and only 1 of 10 restarts was successful

So, we filed a couple SRs and after a while EMC support started claiming that first two issues were addressed in recent Documentum patches. Frankly speaking, during last 4 years I haven’t seen JMS crashes with SIGBUS error in libzip.so, but now I’m not sure that talented team had really fixed the issue, because two days ago our DevOps engineer claimed that Documentum 7.2 is still affected and he was a witness of three succeeded crashes of JMS with SIGBUS error in libzip.so – I just have lost faith in talented team 😦

Well, let discuss this odd error and try to fix it 🙂 If you google for java + sigbus + libzip.so you will find a plenty of bugs in openjdk JIRA and all of them states something like:

  • JVM maps central directory of zip file into memory using mmap system call due to performance considerations
  • it is expected that application receives SIGBUS error is mapped file gets overwritten:
           SIGBUS Attempted access to a portion of the buffer that does not
                  correspond to the file (for example, beyond the end of the
                  file, including the case where another process has truncated
                  the file).
    
  • there is a -Dsun.zip.disableMemoryMapping=true system property which was designed to “prevent crashes”

So, JDK seems to “work as designed”, nobody is going to fix this behaviour (not true for JDK9, but for current JDK versions we must take this behaivour as given) and the real root cause of our problem is “somebody” changes jar files on JMS while they are in use and, hence, causes it crashes, and this “somebody” is an odd TBO implementation. Is it possible to fix it? Actually, I have no idea what talented team was doing four years ago, but I have found robust solution within four hours (if my memory serves me right they was trying to fix the issue during four months) – the good starting point here is to find/write a good testcase and I have found it in JDK-8017777 SIGBUS in java.util.zip.ZipFile.getEntry:

$ java ZipFileTest
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0xa) at pc=0x00000001054cfe92, pid=5775, tid=0x0000000000004d03
#
# JRE version: Java(TM) SE Runtime Environment (8.0_111-b14) (build 1.8.0_111-b14)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.111-b14 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# C  [libzip.dylib+0x2e92]  newEntry+0x154
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /Users/apanfilov/hs_err_pid5775.log
[thread 21251 also had an error]
[thread 22275 also had an error]
[thread 23299 also had an error]
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
[thread 24323 also had an error]
Abort trap: 6

Cool! Not lets check the behaviour of JVM with -Dsun.zip.disableMemoryMapping=true system property:

$ java -Dsun.zip.disableMemoryMapping=true ZipFileTest
Exception in thread "Thread-2" java.util.zip.ZipError: jzentry == 0,
 jzfile = 140315557132880,
 total = 1001,
 name = /var/folders/9d/m0q0kvvx3fj54mr_m3qs3twh0000gn/T/zipfile.zip ,
 i = 1,
 message = null
	at java.util.zip.ZipFile$ZipEntryIterator.next(ZipFile.java:519)
	at java.util.zip.ZipFile$ZipEntryIterator.nextElement(ZipFile.java:499)
	at java.util.zip.ZipFile$ZipEntryIterator.nextElement(ZipFile.java:480)
	at ZipFileTest$ZipFileReader.run(ZipFileTest.java:65)
	at java.lang.Thread.run(Thread.java:745)
Exception in thread "Thread-0" java.util.zip.ZipError: jzentry == 0,
 jzfile = 140315557132880,
 total = 1001,
 name = /var/folders/9d/m0q0kvvx3fj54mr_m3qs3twh0000gn/T/zipfile.zip ,
 i = 1,
 message = null
	at java.util.zip.ZipFile$ZipEntryIterator.next(ZipFile.java:519)
	at java.util.zip.ZipFile$ZipEntryIterator.nextElement(ZipFile.java:499)
	at java.util.zip.ZipFile$ZipEntryIterator.nextElement(ZipFile.java:480)
	at ZipFileTest$ZipFileReader.run(ZipFileTest.java:65)
	at java.lang.Thread.run(Thread.java:745)

Well, JVM didn’t crash, but we got a plenty of errors, so, -Dsun.zip.disableMemoryMapping=true system property prevents crashes only and seems to be unusable 😦 Here we need to stop blaming JDK and start thinking about following: what does mean “mapped file gets overwritten”, i.e. are the following scenarios are doing the same or not:

$ file 1.txt
1.txt: empty
$ echo "talented team" > 1.txt
$ file 1.txt
1.txt: ASCII text
$ echo "talented team" > 1.txt.tmp
$ mv 1.txt.tmp 1.txt

Actually, the second scenario is safe from JVM perspective and small patch for initial testcase may prove this:

@@ -79,9 +78,12 @@ public ZipFileWriter(File aFile) {
         public void run() {
             int i = 1;
             ZipOutputStream zOut;
+            File tmp;
             try {
-                zOut = new ZipOutputStream(new FileOutputStream(file));
-            } catch (FileNotFoundException e1) {
+                tmp = File.createTempFile(ZipFileTest.class.getSimpleName(),
+                        null, file.getParentFile());
+                zOut = new ZipOutputStream(new FileOutputStream(tmp));
+            } catch (IOException e1) {
                 e1.printStackTrace();
                 return;
             }
@@ -102,6 +104,9 @@ public void run() {
             } finally {
                 try {
                     zOut.close();
+                    if(!tmp.renameTo(file)) {
+                        System.err.println("Can't rename");
+                    }
                 } catch (IOException e) {
                 }
             }

and here we are:

$ java ZipFileTest
$ 

What about DFC? Javassist rocks!

Weird DQL grammar. Part II

Typically Documentum ignores DQL hints which are present in subqueries (though in most cases return_top hint is useless):

1> select count(*) from
2> (select user_name from dm_user enable(return_top 10))
3> go
count(*)              
----------------------
                    66
(1 row affected)

also Documentum fails to parse union keyword in subquery:

1> select count(*) from
2> (select user_name from dm_user union select group_name from dm_group)
3> go
[DM_QUERY_E_SYNTAX]error:  
  "A Parser Error (syntax error) has occurred in the vicinity of:  
   select count(*) from (select user_name from dm_user union"

but:

1> select count(*) from
2> (select user_name from dm_user enable(return_range 1 10 'user_name'))
3> go
count(*)              
----------------------
                    10
(1 row affected)

BugMakers :)

Frankly speaking, when I was writing previous blogpost I got surprised to discover that DQL update statement preserves the value of r_lock_owner attribute:

API> revert,c,09024be98006b104
...
OK
API> get,c,09024be98006b104,r_lock_owner
...
dmadmin
API> ?,c,update dm_document objects set object_name='xxx' where r_object_id='09024be98006b104'
objects_updated
---------------
              1
(1 row affected)
[DM_QUERY_I_NUM_UPDATE]info:  "1 objects were affected by your UPDATE statement."


API> revert,c,09024be98006b104
...
OK
API> get,c,09024be98006b104,r_lock_owner
...
dmadmin
API> 

Unfortunately, it is not true when you update objects, which behaviour is customized via TBO:

API> retrieve,c,bs_doc_cash
...
09bc2c71806d6ffe
API> checkout,c,l
...
09bc2c71806d6ffe
API> ?,c,update bs_doc_cash objects set object_name='xxx' where r_object_id='09bc2c71806d6ffe'
objects_updated
---------------
              1
(1 row affected)
[DM_QUERY_I_NUM_UPDATE]info:  "1 objects were affected by your UPDATE statement."


API> revert,c,09bc2c71806d6ffe
...
OK
API> get,c,09bc2c71806d6ffe,r_lock_owner
...

API>

but:

API> checkout,c,09bc2c71806d6ffe
...
09bc2c71806d6ffe
API> apply,c,,EXEC,QUERY,S,update bs_doc_cash objects set object_name='xxx' where r_object_id='09bc2c71806d6ffe',BOF_DQL,B,F
...
q0
API> next,c,q0
...
OK
API> dump,c,q0
...
USER ATTRIBUTES

  objects_updated                 : 1

SYSTEM ATTRIBUTES


APPLICATION ATTRIBUTES


INTERNAL ATTRIBUTES


API> revert,c,09bc2c71806d6ffe
...
OK
API> get,c,09bc2c71806d6ffe,r_lock_owner
...
dmadmin
API> 

javassist

Six months ago Marian Caikovski have shared his experience with java decompilers and advertised CFR decompiler: Decompiling jars obfuscated with AspectJ (e.g. D2FS4DCTM-WEB-4.5.0.jar or dfc.jar) – in free clicks from CFR decompiler page (FAQ -> Procyon / Java Decompiler -> Konloch/bytecode-viewer) you may find another cool project indented to provide universal GUI for java decompilers – Konloch/bytecode-viewer:

Well, why is this blogpost named “javassist”? The problem is I do not trust decompilers because some of them produce completely wrong results, for example, I have noticed that a lot of people do like jd-gui decompiler, because it has cool GUI, but try to decompile following code in jd-gui:

public final class Test { 
    public static void main(String[] argv) throws Exception {
        int i = 0;
        while (++i > 0) {
            try {
                if (i < 10) {
                    throw new Exception("xxx");
                }
                System.out.print("xxx");
                break;
            } catch (Exception e) {
                if (!(i < 10)) {
                    throw e;
                }
            }
        }
    }
}

and you will get something like:

public final class Test
{
  public static void main(String[] paramArrayOfString)
    throws Exception
  {
    int i = 0;
    while (true) { i++; if (i <= 0) break;
      try {
        if (i < 10) {
          throw new Exception("xxx");
        }
        System.out.print("xxx");
      }
      catch (Exception localException) {
        if (i >= 10)
          throw localException;
      }
    }
  }
}

which is completely wrong because decompiled code contains infinite loop while original one does not 😦 Moreover, in general, if we want to changefix the behaviour of buggy class we do not need to reconstruct the original source code of the whole class – in the worst case we just need to reconstruct the source code of buggy method, and in the most cases we do not need to reconstruct source code at all, and javassist helps a lot there, let me provide some examples.

When displaying objects in a grid Webtop likes to throw annoying exceptions like:

at com.documentum.fc.client.impl.docbase.DocbaseExceptionMapper.newException(DocbaseExceptionMapper.java:49)
 at com.documentum.fc.client.impl.connection.docbase.MessageEntry.getException(MessageEntry.java:39)
 at com.documentum.fc.client.impl.connection.docbase.DocbaseMessageManager.getException(DocbaseMessageManager.java:137)
 at com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient.checkForMessages(NetwiseDocbaseRpcClient.java:310)
 at com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient.applyForObject(NetwiseDocbaseRpcClient.java:653)
 at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection$8.evaluate(DocbaseConnection.java:1370)
 at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.evaluateRpc(DocbaseConnection.java:1129)
 at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.applyForObject(DocbaseConnection.java:1362)
 at com.documentum.fc.client.impl.docbase.DocbaseApi.parameterizedFetch(DocbaseApi.java:107)
 at com.documentum.fc.client.impl.objectmanager.PersistentDataManager.fetchFromServer(PersistentDataManager.java:201)
 at com.documentum.fc.client.impl.objectmanager.PersistentDataManager.getData(PersistentDataManager.java:92)
 at com.documentum.fc.client.impl.objectmanager.PersistentObjectManager.getObjectFromServer(PersistentObjectManager.java:355)
 at com.documentum.fc.client.impl.objectmanager.PersistentObjectManager.getObject(PersistentObjectManager.java:311)
 at com.documentum.fc.client.impl.session.Session.getObject(Session.java:946)
 at com.documentum.fc.client.impl.session.SessionHandle.getObject(SessionHandle.java:652)
 at com.documentum.webcomponent.library.actions.ExportAction.queryExecute(Unknown Source)
 at com.documentum.web.formext.action.ActionService.queryExecute(Unknown Source)
 at com.documentum.web.formext.control.action.ActionMultiselect.getMultiselectActionExecuteTable(ActionMultiselect.java:615)
 at com.documentum.web.formext.control.action.ActionMultiselectTag.renderEnd(ActionMultiselectTag.java:185)
 at com.documentum.web.form.ControlTag.doEndTag(Unknown Source)

the root cause of such behaviour is following: webtop selects a list of objects to display and after that it calculates preconditions, the problem is when webtop calculates preconditions the list of objects is already not actual – somebody might delete some objects or change permissions and, hence, some preconditions throw an exception. How to fix this behaviour? I do think it is obvious (but not for talented team) that queryExecute method of ActionService class must return false instead of throwing exceptions, but if we try to decompile ActionService class we will find that it contains about 500 lines of code and, moreover, decompilers do not produce a “beautiful” code – it will take a couple of hours to reconstruct original source code of ActionService class, meanwhile the javassist solution looks extremely robust and simple:

public class ActionServiceFix {

    public static void main(String[] args) throws Exception {
        ClassPool pool = ClassPool.getDefault();
        for (Class cls : new Class[] {Trace.class }) {
            pool.importPackage(cls.getPackage().getName());
        }

        CtClass cc = pool.get(ActionService.class.getName());
        for (Class cls : new Class[] {DfException.class }) {
            pool.importPackage(cls.getPackage().getName());
        }
        CtMethod queryExecute = cc.getDeclaredMethod("queryExecute",
                pool.get(new String[] {ActionService.ActionDef.class.getName(),
                    ArgumentList.class.getName(), Context.class.getName(),
                    Component.class.getName(), boolean.class.getName() }));
        queryExecute.addCatch("Trace.error(null,t);\n;return false;",
                pool.get(Throwable.class.getName()), "t");
        File f = File.createTempFile(ActionService.class.getSimpleName(), ".class");
        FileOutputStream os = new FileOutputStream(f);
        os.write(cc.toBytecode());
        os.close();
        System.out.println(f.getAbsolutePath());
    }

}

Another concurrency issue I already complained about: implementation of aspects in DFC causes DM_OBJ_MGR_E_UNABLE_TO_FETCH_CONSISTENT_OBJECT_SNAPSHOT errors:

public class DfcFix {

    public static void main(String[] args) throws Exception {
        ClassPool pool = ClassPool.getDefault();
        for (Class cls : new Class[] {}) {
            pool.importPackage(cls.getPackage().getName());
        }

        String POM = PersistentObjectManager.class.getName().replace(".", "/")
                + ".class";
        URL resourceURL = DfcFix.class.getResource("/" + POM);
        JarURLConnection connection = (JarURLConnection) resourceURL
                .openConnection();
        URL jarURL = connection.getJarFileURL();
        String fileName = jarURL.getFile();
        ZipFile zipFile = new ZipFile(fileName);
        String out = fileName + ".patched";
        ZipOutputStream zos = new ZipOutputStream(new FileOutputStream(out));
        for (Enumeration e = zipFile.entries(); e.hasMoreElements();) {
            ZipEntry entryIn = (ZipEntry) e.nextElement();
            if (entryIn.getName().equals(POM)) {
                zos.putNextEntry(new ZipEntry(POM));
                zos.write(getPOMFixRename(pool));
            } else {
                zos.putNextEntry(new ZipEntry(entryIn.getName()));
                InputStream is = zipFile.getInputStream(entryIn);
                byte[] buf = new byte[1024];
                int len;
                while ((len = is.read(buf)) > 0) {
                    zos.write(buf, 0, len);
                }
            }
            zos.closeEntry();
        }
        zos.close();
        System.out.println(out);
    }

    private static byte[] getPOMFixRename(ClassPool pool) throws Exception {
        CtClass cc = pool.get(PersistentObjectManager.class.getName());
        for (Class cls : new Class[] {DfException.class, DfExceptions.class, }) {
            pool.importPackage(cls.getPackage().getName());
        }

        CtMethod original = cc.getDeclaredMethod("getObject");
        CtMethod newMethod = CtNewMethod.make(AccessFlag.PUBLIC,
                original.getReturnType(), original.getName(),
                original.getParameterTypes(), original.getExceptionTypes(),
                null, original.getDeclaringClass());
        original.setName("doGetObject");
        original.setModifiers(AccessFlag.setPrivate(original.getModifiers()));
        StringBuilder body = new StringBuilder();
        body.append("{\n");
        body.append("DfException ex = null;\n");
        body.append("for (int i = 0; i < 10; i++) {\n");
        body.append("    try {\n");
        body.append("        return doGetObject($$);\n");
        body.append("    } catch (DfException e) {\n");
        body.append("        ex = e;\n");
        body.append("        if (DfExceptions.isFetchSoft(e)) {\n");
        body.append("            DfLogger.debug(this, \"Got soft exception \"\n");
        body.append("                    + \"on {0} iteration\", new Object[] {i + 1 }, e);\n");
        body.append("            continue;\n");
        body.append("        }\n");
        body.append("        throw ex;\n");
        body.append("    }\n");
        body.append("}\n");
        body.append("throw ex;\n");
        body.append("\n}");
        newMethod.setBody(body.toString());
        cc.addMethod(newMethod);
        return cc.toBytecode();

    }

}

Beware of dbi services

Do you remember a guy, who accidentally discovered SQL injection in Content Server? I can’t understand why some people do such things, so I take it for granted that we can’t prevent such misbehaviour, however I wonder why these people come up with heart-piercing stories. Below are two another stories:

Documentum – Not able to install IndexAgent with xPlore 1.6 – everything is good except following command listing:

[xplore@full_text_server_01 ~]$ echo 'export DEVRANDOM=/dev/urandom' >> ~/.bash_profile
[root@full_text_server_01 ~]# yum -y install rng-tools.x86_64
Loaded plugins: product-id, search-disabled-repos, security, subscription-manager
Setting up Install Process
Resolving Dependencies
--> Running transaction check
...
Transaction Test Succeeded
Running Transaction
  Installing : rng-tools-5-2.el6_7.x86_64                                                                                                                                                                                     1/1
  Verifying  : rng-tools-5-2.el6_7.x86_64                                                                                                                                                                                     1/1
 
Installed:
  rng-tools.x86_64 0:5-2.el6_7
 
Complete!
[root@full_text_server_01 ~]# rpm -qf /etc/sysconfig/rngd
rng-tools-5-2.el6_7.x86_64
[root@full_text_server_01 ~]#
[root@full_text_server_01 ~]# sed -i 's,EXTRAOPTIONS=.*,EXTRAOPTIONS=\"-r /dev/urandom -o /dev/random -t 0.1\",' /etc/sysconfig/rngd
[root@full_text_server_01 ~]# cat /etc/sysconfig/rngd
# Add extra options here
EXTRAOPTIONS="-r /dev/urandom -o /dev/random -t 0.1"
[root@full_text_server_01 ~]#
[root@full_text_server_01 ~]# chkconfig --level 345 rngd on
[root@full_text_server_01 ~]# chkconfig --list | grep rngd
rngd            0:off   1:off   2:off   3:on    4:on    5:on    6:off
[root@full_text_server_01 ~]#
[root@full_text_server_01 ~]# service rngd start
Starting rngd:                                             [  OK  ]
[root@full_text_server_01 ~]#

which actually looks exactly the same as my recommendations for increasing entropy on Linux/VMWare, and the real gem is how blogpost author tried to protect himself – there are even four explanations why it looks extremely similar:

  • I would say the source is myself
  • At that time, I opened a SR# with the EMC Support
  • These commands haven’t been provided by EMC, they are part of our IQs since 2014/2015
  • Moreover how is that a proof? I mean all I did is a sed command to update the file /etc/sysconfig/rngd and the setup of the rngd service using chkconfig… There is no magic here, there is nothing secret…

Well, I would buy the last explanation if there were no following inconsistencies:

  • What was the reason to execute rpm -qf /etc/sysconfig/rngd if you already installed rng-tools? In my recommendations I used this command to show where /etc/sysconfig/rngd file came from
  • DEVRANDOM environment variable affects Content Server only, in java environment it does not make sense
  • The second blogpost, see below…

Documentum – Increase the number of concurrent_sessions – initially the solution was posted 4 years ago on ECN blog, moreover it is also published in EMC KB (note the publication date – it is not consistent with “A few months ago at one of our customer …” statement):

and in another EMC KB (wow! there is a mention of 1100):

Actually, as it was mentioned in my ECN blogpost – the DM_FD_SETSIZE “option” is “officially” available since 6.7SP1P19 and 6.7SP2P04 (and as well in 7.0, 7.1, 7.2 and 7.3, not officially this option is available since 6.7SP1P15), so, I wonder how it was possible that DBI guys were able to do following:

An EMC internal task (CS-40186) was opened to discuss this point and to discuss the possibility to increase this maximum number. Since the current default limit is set only in regards to the default OS value of 1024, if this value is increased to 4096 for example (which was our case since the beginning), then there is no real reason to be stuck at 1020 on Documentum side. The Engineering Team implemented a change in the binaries that allows changing the limit

Moreover, there is another inconsistency: until CS-40517 EMC was suggesting to launch multiple Content Server instances on the same host in order to overcome the limit on 1020 concurrent sessions per Content Server instance, so in case of blogpost author he was need to launch two Content Servers on each host and get an overall limit of 4080 concurrent sessions, but in my case I was need to launch about 10 Content Servers, and, because I was considering such configuration as unmanageable, I performed some research and filed a CR on November 2012.

Database connections

Let’s discuss another statement from performance guide (forget about the fact that the writer experiences difficulties with math (I do believe that 2x2x250 is 1000 but not 2000)):

Tune PROCESSES: As a rule of thumb, PROCESSES should be set to the double of the sum of maximum concurrent sessions of all content server instances. For example, if there are two instances each with concurrent_sessions set to 250, PROCESSES should be set to 2000 (2x2x250). It needs to specify the scope in the alter command when change the value.

Well, the question is: why does docbase session require two database connections? Actually, it is worth writing a continuation for “r_object_id. Type tag” blogpost and describe how Content Server (actually DFC) generates identifiers, but the common idea is following: when DFC creates new object (i.e. when we call IDfSession#newObject method) it sends NEXT_ID or NEXT_ID_LIST RPC command to Content Server:

API> apply,c,,NEXT_ID_LIST,TAG,I,8,HOW_MANY,I,10
...
q0
API> ?,c,q0
next_id         
----------------
08024be98005fe92
08024be98005fe93
08024be98005fe94
08024be98005fe95
08024be98005fe96
08024be98005fe97
08024be98005fe98
08024be98005fe99
08024be98005fe9a
08024be98005fe9b
(1 row affected)

and Content Server modifies data stored in dmi_sequence_s table. Now imagine that we are creating tons of objects in parallel and, moreover, we are taking advantage of using transactions, how do NEXT_ID/NEXT_ID_LIST commands work in this case? If CS modifies data in dmi_sequence_s table in transaction, corresponding rows become locked and another transaction unable to modify the same data, so, to prevent such concurrency issue when performing NEXT_ID/NEXT_ID_LIST commands Content Server creates new (temporary) database connection and never releases it:

API> ?,c,execute EXEC_SELECT_SQL with QUERY='SELECT COUNT(*) cnt 
     FROM v$session WHERE USERNAME=SYS_CONTEXT(''USERENV'',''SESSION_USER'')'
cnt                   
----------------------
                    14
(1 row affected)

API> apply,c,,NEXT_ID_LIST,TAG,I,8,HOW_MANY,I,1000
...
q0
API> ?,c,execute EXEC_SELECT_SQL with QUERY='SELECT COUNT(*) cnt 
     FROM v$session WHERE USERNAME=SYS_CONTEXT(''USERENV'',''SESSION_USER'')'
cnt                   
----------------------
                    15
(1 row affected)

you may observe the reference to this temporary database connection in output of SHOW_SESSIONS command (note the value of tempdb_session_ids, also note that, in general, tempdb_session_ids in SHOW_SESSIONS output displays wrong data):

USER ATTRIBUTES

  root_start                      : 5/10/2017 09:17:05
  root_pid                        : 2502
  shared_mem_id                   : 7667716
  semaphore_id                    : 327685
  session                      [0]: 01024be9800060d7
  db_session_id                [0]: 341
  typelockdb_session_id        [0]: -1
  tempdb_session_ids           [0]: 328
  pid                          [0]: 21067
  user_name                    [0]: dmadmin
  user_authentication          [0]: Trusted Client
  client_host                  [0]: docu72dev01
  client_lib_ver               [0]: 7.2.0030.0072
  client_locale                [0]: (Linux :(8201), Version: Linux), CharSet: UTF-8, Language: English_US, UTC Offset: 14400, Date Form
  start                        [0]: 5/10/2017 13:33:16
  last_used                    [0]: 5/10/2017 13:36:52
  session_status               [0]: Active
  shutdown_flag                [0]: none
  last_rpc                     [0]: EXEC, EXEC, NEXT_ID_LIST, EXEC, EXEC, TIME, FETCH_TYPE, GET_SERVER_CONFI, GET_ERRORS, AUTHENTICATE_USE, ENTRY_POINTS,
  current_rpc                  [0]: SHOW_SESSIONS
  last_completed_rpc           [0]: 5/10/2017 13:34:39
  transaction_status           [0]: Off
  dormancy_status              [0]: Active

So, in long term perspective (actually, both CS and DFC tries to pool “available” identifiers) every docbase session tends to have at least two associated database connections – that is a starting point for database capacity planning, but the question is whether it is an upper bound or not, i.e if we follow “best practices” and set the value of PROCESSES parameter to the double of the sum of maximum concurrent sessions of all content server instances will it prevent us from getting “ORA-00020: maximum number of processes (%s) exceeded” errors or not? And, unfortunately, the answer is: no, it won’t:

API> execquery,c,F,select * from dm_user
...
OK
API> execquery,c,F,select * from dm_user
...
OK
API> execquery,c,F,select * from dm_user
...
OK
API> execquery,c,F,select * from dm_user
...
OK
API> execquery,c,F,select * from dm_user
...
OK
API> apply,c,,SHOW_SESSIONS
...
q7
API> next,c,q7
...
OK
API> dump,c,q7
...
USER ATTRIBUTES

  root_start                      : 5/10/2017 09:17:05
  root_pid                        : 2502
  shared_mem_id                   : 7667716
  semaphore_id                    : 327685
  session                      [0]: 01024be9800060d7
  db_session_id                [0]: 341
  typelockdb_session_id        [0]: -1
  tempdb_session_ids           [0]: 328:74:352:281:270:141
  pid                          [0]: 21787
  user_name                    [0]: dmadmin
  user_authentication          [0]: Trusted Client
  client_host                  [0]: docu72dev01
  client_lib_ver               [0]: 7.2.0030.0072
  client_locale                [0]: (Linux :(8201), Version: Linux), CharSet: UTF-8, Language: English_US, UTC Offset: 14400, Date Form
  start                        [0]: 5/10/2017 13:45:30
  last_used                    [0]: 5/10/2017 13:45:37
  session_status               [0]: Active
  shutdown_flag                [0]: none
  last_rpc                     [0]: EXEC, EXEC, EXEC, EXEC, EXEC, GET_ERRORS, AUTHENTICATE_USE, ENTRY_POINTS, 
  current_rpc                  [0]: SHOW_SESSIONS
  last_completed_rpc           [0]: 5/10/2017 13:45:34
  transaction_status           [0]: Off
  dormancy_status              [0]: Active