Q & A. XIV

Hi,
I need to link multiple documents to folder in documentum.
those documents are checked out. Can you please advise how can i achieve this?

Thanks
Ram

Unfortunately, this guy didn’t provide a good description for his problem, so, let’s pretend that he is trying to do something like:

update dm_document objects
link '/target/folder'
where ...

and getting something like:

API> ?,c,update dm_document objects link '/Temp' where r_object_id='09024be98006ab34'
[DM_QUERY_F_UP_SAVE]fatal:  "UPDATE:  An error has occurred during a save operation."

[DM_SYSOBJECT_E_LOCKED]error:  
  "The operation on  sysobject was unsuccessful because it is locked by user DCTM_DEV."


API> get,c,09024be98006ab34,r_lock_owner
...
DCTM_DEV

Now follow my hands – it is a magic:

API> ?,c,alter group dm_escalated_allow_save_on_lock  add dmadmin

API> ?,c,update dm_document objects link '/Temp' where r_object_id='09024be98006ab34'
objects_updated
---------------
              1
(1 row affected)
[DM_QUERY_I_NUM_UPDATE]info:  "1 objects were affected by your UPDATE statement."


API> revert,c,09024be98006ab34
...
OK
API> get,c,09024be98006ab34,r_lock_owner
...
DCTM_DEV

Database connections. Part II

Well, previosely we defined an estimation for database connection – about twice amount of concurrent Documentum sessions – may be less, may be more, depends on application. Now the question: how many connections is it possible to create in database? OpenText thinks that the number of database connections is unlimited:

D2 runs on content server which has good scalability by adding additional content server nodes. Content server is often the bottleneck of the whole D2 topology when the system is running under a heavy load condition. When content server reaches its bottleneck, we could monitored the CPU usage of content server is up to 90% and the number of active sessions grows very fast. To add one additional content server node to the existing environment could improve system throughput significantly.
Officially we suggests adding one additional content server node on every 300 concurrent users’ growth. The mileage

which is actually not true, on the other hand if OpenText has written something like: our product fails to take advantage of best practices and does not pool database connections, it would be ridiculous, so instead of improving product they has preferred to declare another marketing bullshit.

So, why database connection pooling is important? If you try to ask google you will find something like: creating database connection is an expensive and time-consuming operation: application needs to perform TCP (or even TLS) handshake, authenticate, database needs to start new process, etc…, so, it is recommended to pool database connections. Unfortunately it is only a half of the truth – pools also limit the number of concurrent database connections, and it is important too, let me quote the best oracle database expert ever:

In looking at your Automatic Workload Repository report, I see that the longest-running events at the system level are latch-related: cache buffers chains and library cache. Additionally, your CPU time was way up there. Concurrency-based waits are caused by one thing and one thing only: having many concurrently active sessions. If you had fewer concurrently active sessions, you would by definition have fewer concurrency-based waits (fewer people contending with each other). I see that you had 134 sessions in the database running on a total of 4 CPU cores. Because it is not really possible for 4 CPU cores to allow 134 sessions to be concurrently active, I recommend that you decrease the number of concurrent sessions by reducing the size of your connection pool—radically. Cut your connection pool to something like 32. It is just not possible for 4 cores to run 134 things at the same time; 4 cores can run only 4 things at exactly the same time. If you reduce the concurrent user load, you’ll see those waits go down, response time go down, and your transaction rates improve. In short, you’ll find out that you can actually do more work with fewer sessions than you can with more.

I know that this fewer-does-more suggestion sounds counterintuitive, but I encourage you to watch this narrated Real World Performance video.

In this video, you’ll see what happens in a test of a 12-core machine running transactions and decreasing the size of a connection pool from an unreasonable number (in the thousands) to a reasonable number: 96. At 96 connections on this particular machine, the machine was able to do 150 percent the number of transactions per second and took the response time for these transactions from ~100 milliseconds down to ~5 milliseconds.

Short of reducing your connection pool size (and therefore changing the way the application is using the database by queuing in the middle-tier connection pool instead of queuing hundreds of active sessions in the database), you would have to change your queries to make them request cache buffers chains latches less often. In short: tune the queries and the algorithms in the application. There is literally no magic here. Tweaking things at the system level might not be an option. Touching the application might have to be an option.

And from Documentum perspective, the only option to limit the number of database connections is to use shared server feature (fuck yeah, Oracle supports Database Resident Connection Pooling since 11g, but mature product does not). And do not pay much attention to EMC’s documents like Optimizing Oracle for EMC Documentum – such documents are wrong from beginning to end.

Beware of dbi services

Do you remember a guy, who accidentally discovered SQL injection in Content Server? I can’t understand why some people do such things, so I take it for granted that we can’t prevent such misbehaviour, however I wonder why these people come up with heart-piercing stories. Below are two another stories:

Documentum – Not able to install IndexAgent with xPlore 1.6 – everything is good except following command listing:

[xplore@full_text_server_01 ~]$ echo 'export DEVRANDOM=/dev/urandom' >> ~/.bash_profile
[root@full_text_server_01 ~]# yum -y install rng-tools.x86_64
Loaded plugins: product-id, search-disabled-repos, security, subscription-manager
Setting up Install Process
Resolving Dependencies
--> Running transaction check
...
Transaction Test Succeeded
Running Transaction
  Installing : rng-tools-5-2.el6_7.x86_64                                                                                                                                                                                     1/1
  Verifying  : rng-tools-5-2.el6_7.x86_64                                                                                                                                                                                     1/1
 
Installed:
  rng-tools.x86_64 0:5-2.el6_7
 
Complete!
[root@full_text_server_01 ~]# rpm -qf /etc/sysconfig/rngd
rng-tools-5-2.el6_7.x86_64
[root@full_text_server_01 ~]#
[root@full_text_server_01 ~]# sed -i 's,EXTRAOPTIONS=.*,EXTRAOPTIONS=\"-r /dev/urandom -o /dev/random -t 0.1\",' /etc/sysconfig/rngd
[root@full_text_server_01 ~]# cat /etc/sysconfig/rngd
# Add extra options here
EXTRAOPTIONS="-r /dev/urandom -o /dev/random -t 0.1"
[root@full_text_server_01 ~]#
[root@full_text_server_01 ~]# chkconfig --level 345 rngd on
[root@full_text_server_01 ~]# chkconfig --list | grep rngd
rngd            0:off   1:off   2:off   3:on    4:on    5:on    6:off
[root@full_text_server_01 ~]#
[root@full_text_server_01 ~]# service rngd start
Starting rngd:                                             [  OK  ]
[root@full_text_server_01 ~]#

which actually looks exactly the same as my recommendations for increasing entropy on Linux/VMWare, and the real gem is how blogpost author tried to protect himself – there are even four explanations why it looks extremely similar:

  • I would say the source is myself
  • At that time, I opened a SR# with the EMC Support
  • These commands haven’t been provided by EMC, they are part of our IQs since 2014/2015
  • Moreover how is that a proof? I mean all I did is a sed command to update the file /etc/sysconfig/rngd and the setup of the rngd service using chkconfig… There is no magic here, there is nothing secret…

Well, I would buy the last explanation if there were no following inconsistencies:

  • What was the reason to execute rpm -qf /etc/sysconfig/rngd if you already installed rng-tools? In my recommendations I used this command to show where /etc/sysconfig/rngd file came from
  • DEVRANDOM environment variable affects Content Server only, in java environment it does not make sense
  • The second blogpost, see below…

Documentum – Increase the number of concurrent_sessions – initially the solution was posted 4 years ago on ECN blog, moreover it is also published in EMC KB (note the publication date – it is not consistent with “A few months ago at one of our customer …” statement):

and in another EMC KB (wow! there is a mention of 1100):

Actually, as it was mentioned in my ECN blogpost – the DM_FD_SETSIZE “option” is “officially” available since 6.7SP1P19 and 6.7SP2P04 (and as well in 7.0, 7.1, 7.2 and 7.3, not officially this option is available since 6.7SP1P15), so, I wonder how it was possible that DBI guys were able to do following:

An EMC internal task (CS-40186) was opened to discuss this point and to discuss the possibility to increase this maximum number. Since the current default limit is set only in regards to the default OS value of 1024, if this value is increased to 4096 for example (which was our case since the beginning), then there is no real reason to be stuck at 1020 on Documentum side. The Engineering Team implemented a change in the binaries that allows changing the limit

Moreover, there is another inconsistency: until CS-40517 EMC was suggesting to launch multiple Content Server instances on the same host in order to overcome the limit on 1020 concurrent sessions per Content Server instance, so in case of blogpost author he was need to launch two Content Servers on each host and get an overall limit of 4080 concurrent sessions, but in my case I was need to launch about 10 Content Servers, and, because I was considering such configuration as unmanageable, I performed some research and filed a CR on November 2012.

Database connections

Let’s discuss another statement from performance guide (forget about the fact that the writer experiences difficulties with math (I do believe that 2x2x250 is 1000 but not 2000)):

Tune PROCESSES: As a rule of thumb, PROCESSES should be set to the double of the sum of maximum concurrent sessions of all content server instances. For example, if there are two instances each with concurrent_sessions set to 250, PROCESSES should be set to 2000 (2x2x250). It needs to specify the scope in the alter command when change the value.

Well, the question is: why does docbase session require two database connections? Actually, it is worth writing a continuation for “r_object_id. Type tag” blogpost and describe how Content Server (actually DFC) generates identifiers, but the common idea is following: when DFC creates new object (i.e. when we call IDfSession#newObject method) it sends NEXT_ID or NEXT_ID_LIST RPC command to Content Server:

API> apply,c,,NEXT_ID_LIST,TAG,I,8,HOW_MANY,I,10
...
q0
API> ?,c,q0
next_id         
----------------
08024be98005fe92
08024be98005fe93
08024be98005fe94
08024be98005fe95
08024be98005fe96
08024be98005fe97
08024be98005fe98
08024be98005fe99
08024be98005fe9a
08024be98005fe9b
(1 row affected)

and Content Server modifies data stored in dmi_sequence_s table. Now imagine that we are creating tons of objects in parallel and, moreover, we are taking advantage of using transactions, how do NEXT_ID/NEXT_ID_LIST commands work in this case? If CS modifies data in dmi_sequence_s table in transaction, corresponding rows become locked and another transaction unable to modify the same data, so, to prevent such concurrency issue when performing NEXT_ID/NEXT_ID_LIST commands Content Server creates new (temporary) database connection and never releases it:

API> ?,c,execute EXEC_SELECT_SQL with QUERY='SELECT COUNT(*) cnt 
     FROM v$session WHERE USERNAME=SYS_CONTEXT(''USERENV'',''SESSION_USER'')'
cnt                   
----------------------
                    14
(1 row affected)

API> apply,c,,NEXT_ID_LIST,TAG,I,8,HOW_MANY,I,1000
...
q0
API> ?,c,execute EXEC_SELECT_SQL with QUERY='SELECT COUNT(*) cnt 
     FROM v$session WHERE USERNAME=SYS_CONTEXT(''USERENV'',''SESSION_USER'')'
cnt                   
----------------------
                    15
(1 row affected)

you may observe the reference to this temporary database connection in output of SHOW_SESSIONS command (note the value of tempdb_session_ids, also note that, in general, tempdb_session_ids in SHOW_SESSIONS output displays wrong data):

USER ATTRIBUTES

  root_start                      : 5/10/2017 09:17:05
  root_pid                        : 2502
  shared_mem_id                   : 7667716
  semaphore_id                    : 327685
  session                      [0]: 01024be9800060d7
  db_session_id                [0]: 341
  typelockdb_session_id        [0]: -1
  tempdb_session_ids           [0]: 328
  pid                          [0]: 21067
  user_name                    [0]: dmadmin
  user_authentication          [0]: Trusted Client
  client_host                  [0]: docu72dev01
  client_lib_ver               [0]: 7.2.0030.0072
  client_locale                [0]: (Linux :(8201), Version: Linux), CharSet: UTF-8, Language: English_US, UTC Offset: 14400, Date Form
  start                        [0]: 5/10/2017 13:33:16
  last_used                    [0]: 5/10/2017 13:36:52
  session_status               [0]: Active
  shutdown_flag                [0]: none
  last_rpc                     [0]: EXEC, EXEC, NEXT_ID_LIST, EXEC, EXEC, TIME, FETCH_TYPE, GET_SERVER_CONFI, GET_ERRORS, AUTHENTICATE_USE, ENTRY_POINTS,
  current_rpc                  [0]: SHOW_SESSIONS
  last_completed_rpc           [0]: 5/10/2017 13:34:39
  transaction_status           [0]: Off
  dormancy_status              [0]: Active

So, in long term perspective (actually, both CS and DFC tries to pool “available” identifiers) every docbase session tends to have at least two associated database connections – that is a starting point for database capacity planning, but the question is whether it is an upper bound or not, i.e if we follow “best practices” and set the value of PROCESSES parameter to the double of the sum of maximum concurrent sessions of all content server instances will it prevent us from getting “ORA-00020: maximum number of processes (%s) exceeded” errors or not? And, unfortunately, the answer is: no, it won’t:

API> execquery,c,F,select * from dm_user
...
OK
API> execquery,c,F,select * from dm_user
...
OK
API> execquery,c,F,select * from dm_user
...
OK
API> execquery,c,F,select * from dm_user
...
OK
API> execquery,c,F,select * from dm_user
...
OK
API> apply,c,,SHOW_SESSIONS
...
q7
API> next,c,q7
...
OK
API> dump,c,q7
...
USER ATTRIBUTES

  root_start                      : 5/10/2017 09:17:05
  root_pid                        : 2502
  shared_mem_id                   : 7667716
  semaphore_id                    : 327685
  session                      [0]: 01024be9800060d7
  db_session_id                [0]: 341
  typelockdb_session_id        [0]: -1
  tempdb_session_ids           [0]: 328:74:352:281:270:141
  pid                          [0]: 21787
  user_name                    [0]: dmadmin
  user_authentication          [0]: Trusted Client
  client_host                  [0]: docu72dev01
  client_lib_ver               [0]: 7.2.0030.0072
  client_locale                [0]: (Linux :(8201), Version: Linux), CharSet: UTF-8, Language: English_US, UTC Offset: 14400, Date Form
  start                        [0]: 5/10/2017 13:45:30
  last_used                    [0]: 5/10/2017 13:45:37
  session_status               [0]: Active
  shutdown_flag                [0]: none
  last_rpc                     [0]: EXEC, EXEC, EXEC, EXEC, EXEC, GET_ERRORS, AUTHENTICATE_USE, ENTRY_POINTS, 
  current_rpc                  [0]: SHOW_SESSIONS
  last_completed_rpc           [0]: 5/10/2017 13:45:34
  transaction_status           [0]: Off
  dormancy_status              [0]: Active

CMIS

About 6 moths ago I was complaining about CMIS that it considers all requests which contain the same credentials (i.e. login and password are the same for all requests) as requests from the same client, and for all requests containing the same credentials CMIS uses the only one repository session, unfortunately we didn’t implement solution proposed in that blogpost – it is hard to maintain different passwords across clients, meanwhile we have started receiving concurrency-related errors, so, it was required to undertake something and I have found a solution – it is enough to replace just two classes in CMIS:

JMS high availability feature. Part II

Why I did recall a feature, which I have never used before and will never use in the future? The explanation is following: In order to refresh my memory I was reading installation guide for Content Server 7.3 and noticed following statement:

Actually, documentation does not explain what does mean “methods requiring trusted authentication”, it seems that remote JMS supports workflow methods only, but from any perspective this statement sounds weird, the problem is on that moment I already discovered vulnerability in Content Server which allows attacker to download $DOCUMENTUM_SHARED/config/dfc.keystore file, this file is very interesting because it allows to connect to Content Server as superuser (note the value of server_trust_priv flag):

[dmadmin@docu72dev01 config]$ keytool -list -v -keystore dfc.keystore 
Enter keystore password:  

*****************  WARNING WARNING WARNING  *****************
* The integrity of the information stored in your keystore  *
* has NOT been verified!  In order to verify its integrity, *
* you must provide your keystore password.                  *
*****************  WARNING WARNING WARNING  *****************

Keystore type: JKS
Keystore provider: SUN

Your keystore contains 1 entry

Alias name: dfc
Creation date: May 5, 2015
Entry type: PrivateKeyEntry
Certificate chain length: 1
Certificate[1]:
Owner: CN=dfc_zOkF5qKyACcQUjLJD2bt1y3dXr0a, O=EMC, OU=Documentum
Issuer: CN=dfc_zOkF5qKyACcQUjLJD2bt1y3dXr0a, O=EMC, OU=Documentum
Serial number: 4d23be10ce8e183732c451091e0e3dbf
Valid from: Tue May 05 16:03:10 MSK 2015 until: Fri May 02 16:08:10 MSK 2025
Certificate fingerprints:
         MD5:  8B:BD:5C:F6:18:9D:27:9F:28:A7:69:A4:45:AD:32:63
         SHA1: 37:CC:14:C7:3E:BA:8F:AF:CE:E8:E5:4E:D2:F5:01:AF:3E:B6:1D:3F
         SHA256: 88:FA:7A:04:F8:47:AE:88:AC:EB:D5:BE:28:80:A6:7E:21:51:34:86:A5:96:0E:FF:11:61:90:E9:EA:AC:B4:0C
         Signature algorithm name: SHA1withRSA
         Version: 1


*******************************************
*******************************************


API> retrieve,c,dm_client_rights where client_id='dfc_zOkF5qKyACcQUjLJD2bt1y3dXr0a'
...
08024be980000587
API> dump,c,l
...
USER ATTRIBUTES

  object_name                     : dfc_docu72dev01_3dXr0a
  title                           :
  subject                         :
  authors                       []: <none>
  keywords                      []: <none>
  resolution_label                :
  owner_name                      : dmadmin
  owner_permit                    : 7
  group_name                      : docu
  group_permit                    : 1
  world_permit                    : 1
  log_entry                       :
  acl_domain                      : dmadmin
  acl_name                        : dm_45024be980000222
  language_code                   :
  client_id                       : dfc_zOkF5qKyACcQUjLJD2bt1y3dXr0a
  public_key_identifier           : 5F6CF69241D4745C01C943BAD1AFFB027398EF32
  host_name                       : docu72dev01
  allowed_roles                 []: <none>
  allow_all_roles                 : T
  allow_all_priv_modules          : F
  principal_auth_priv             : T
  server_trust_priv               : T
  app_name                        :
  is_globally_managed             : F

So, there is a kind of interesting situation: official software is unable to take advantage of trusted authentication, but attacker can 🙂

But on the last week EMC published another interesting support note – JMS high availability feature does not work:

dfc.query.should_include_object_name

Have never thought that my colleagues may teach me something…

Yesterday I asked my colleague, who is trying to improve his skills in performance optimisation, whether he had any idea how to improve this SQL statement:

SELECT ALL dm_folder.r_object_id
  FROM dm_folder_sp dm_folder
 WHERE     (    EXISTS
                   (SELECT r_object_id
                      FROM dm_folder_r
                     WHERE     dm_folder.r_object_id = r_object_id
                           AND r_folder_path = :"SYS_B_00")
            AND (dm_folder.object_name = :"SYS_B_01"))
       AND (    dm_folder.i_has_folder = :"SYS_B_02"
            AND dm_folder.i_is_deleted = :"SYS_B_03")

and, surprisingly, the answer was: “Yes, I have seen something similar on support site – EMC suggest to set dfc.query.should_include_object_name and dfc.query.should_include_object_name properties, something like:

dfc.query.object_name_for_docbase[0]=<docbase_name>
dfc.query.should_include_object_name[0]=false


Well, as was expected both dfc.query.should_include_object_name and dfc.query.should_include_object_name properties are not documented, so let discuss the problem more thoroughly.

Imagine that we are maintaining following folder structure in our docbase:

\_CLIENT_1
  \_CLAIMS
  \_INVOCES
\_CLINET_2
  \_CLAIMS
  \_INVOCES
...
\_CLIENT_X
  \_CLAIMS
  \_INVOCES

i.e. for every client we create the same folder structure and when we want to store invoice for particular client we do something like:

create,c,dm_document
set,c,l,object_name
xxx
link,c,/CLIENTS/CLINET_1/INVOICES
save,c,l

the problem is that upon link call DFC calls IDfSession#getFolderByPath method to retrieve folder object with particular path, and inside IDfSession#getFolderByPath method DFC does following: it cuts off object name part from the path (i.e. everything after last ‘/’) and builds following DQL query:

SELECT r_object_id FROM dm_folder 
WHERE object_name='INVOICES' 
 AND ANY r_folder_path='/CLIENTS/CLINET_1/INVOICES'

such implementation is bit weird for two reasons:

  • when I do the same I just write something like “retrieve,c,dm_folder where any r_folder_path=”…” and do not bother myself about object name
  • Content Server has a build-in FolderIdFindByPath RPC command:
    API> apply,c,,FolderIdFindByPath,_FOLDER_PATH_,S,/dmadmin
    ...
    q0
    API> next,c,q0
    ...
    OK
    API> get,c,q0,result
    ...
    0c01d92080000105
    API> close,c,q0
    ...
    OK

    which generates following effective SQL statement:

    select r_object_id from dm_folder_r where r_folder_path = :p0

so, I have no idea why DFC performs extra logic here, moreover, in case of current DFC implementation we are getting overcomplicated SQL query and, sometimes database engine fails to build a good execution plan for this query (this is caused by dumb recommendation to set CURSOR_SHARING database parameter to FORCE and depending on docbase structure execution of such query may take minutes). Below are two possible execution plans for this query:

good (dm_folder_r is a leading table – querying dm_folder_r table by r_folder_path will always return not more than one row):

and bad (dm_folder_r is not a leading table – imagine that we have 1 million clients and hence 1 million INVOICE folders, so querying dm_sysobjec_s table by object_name first will return 1 million records):

in case of “retrieve,c,dm_folder where any r_folder_path=”…” execution plan is always good:

In 2011 (if my memory serves me right), I solved such performance problem by marking index on dm_folder_r(r_folder_path) as unique – in this case database engine always builds the correct execution plan because it knows that querying dm_folder_r table will always return not more than one row, however in recent versions DFC it is possible to disable it’s dumb behaviour by setting dfc.query.should_include_object_name and dfc.query.should_include_object_name properties – can’t understand why this wasn’t enabled by default.