Why CURSOR_SHARING=FORCE sucks. Part II

I believe everybody who maintains Documentum repository with intensive workflow do see following query in the top of database performance reports:

UPDATE /*+ USE_NL(dmi_workitem_s) */
      dmi_workitem_s
   SET a_wq_name = :p0
 WHERE r_object_id =
          ANY (SELECT /*+ CARDINALITY(1) */
                      wis2.r_object_id
                 FROM (SELECT wis3.r_object_id AS r_object_id
                         FROM (  SELECT /*+ CARDINALITY(1) */
                                        wis.r_object_id AS r_object_id
                                   FROM dmi_workitem_s wis
                                  WHERE     wis.r_runtime_state >= 0
                                        AND wis.r_runtime_state <= 1
                                        AND wis.r_auto_method_id >
                                               '0000000000000000'
                                        AND wis.a_wq_name = ' '
                                        AND wis.r_next_retry_date < SYSDATE
                               ORDER BY r_creation_date ASC) wis3
                        WHERE ROWNUM <= 90) wis2);

This query is performed by workflow agent and their bad performance actually reveals a poor database design because dmi_workitem_s table does not contain column with high selectivity:

  • records with r_runtime_state IN (0, 1) relates to both auto and manual activities
  • records with r_auto_method_id > ‘0000000000000000’ relates to both completed and non-completed auto activities

Actually, in case of MSSQL and PostgreSQL it would possible to create “ideal” index for this query like:

CREATE INDEX idx_auto_tasls
   ON dmi_workitem_s (r_object_id, r_next_retry_date, r_creation_date)
   WHERE wis.r_runtime_state >= 0
     AND wis.r_runtime_state <= 1
     AND wis.r_auto_method_id > '0000000000000000'
     AND wis.a_wq_name = ' ';

because both MSSQL and PostgreSQL support partial indexes, in case of Oracle the solution is not so straightforward, because it either required to rewrite query to the following form:

UPDATE /*+ USE_NL(dmi_workitem_s) */
      dmi_workitem_s
   SET a_wq_name = :p0
 WHERE r_object_id =
          ANY (SELECT /*+ CARDINALITY(1) */
                      wis2.r_object_id
                 FROM (SELECT wis3.r_object_id AS r_object_id
                         FROM (  SELECT /*+ CARDINALITY(1) */
                                        wis.r_object_id AS r_object_id
                                   FROM dmi_workitem_s wis
                                  WHERE     CASE
                                               WHEN     wis.r_runtime_state >=
                                                           0
                                                    AND wis.r_runtime_state <=
                                                           1
                                                    AND wis.r_auto_method_id >
                                                           '0000000000000000'
                                                    AND wis.a_wq_name = ' '
                                               THEN
                                                  1
                                            END = 1
                                        AND wis.r_next_retry_date < SYSDATE
                               ORDER BY r_creation_date ASC) wis3
                        WHERE ROWNUM <= 90) wis2);

and create following functional-based index:

CREATE INDEX idx_auto_tasks
   ON dmi_workitem_s (
      CASE
         WHEN     r_runtime_state >= 0
              AND r_runtime_state <= 1
              AND r_auto_method_id > '0000000000000000'
              AND a_wq_name = ' '
         THEN
            1
      END);

or create materialized view like:

CREATE MATERIALIZED VIEW mv_auto_tasks
   REFRESH FAST ON COMMIT
   ENABLE QUERY REWRITE
AS
   SELECT wis.r_object_id, wis.r_creation_date, wis.r_next_retry_date
     FROM dmi_workitem_s wis
    WHERE     wis.r_runtime_state >= 0
          AND wis.r_runtime_state <= 1
          AND wis.r_auto_method_id > '0000000000000000'
          AND wis.a_wq_name = ' ';

and take advantage of query rewrite:

---------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT                   |                    |     1 |    70 |     3  (34)| 00:00:01 |
|   1 |  UPDATE                            | DMI_WORKITEM_S     |       |       |            |          |
|   2 |   NESTED LOOPS                     |                    |     1 |    70 |     3  (34)| 00:00:01 |
|   3 |    VIEW                            |                    |     1 |    10 |     3  (34)| 00:00:01 |
|*  4 |     COUNT STOPKEY                  |                    |       |       |            |          |
|   5 |      VIEW                          |                    |     1 |    10 |     3  (34)| 00:00:01 |
|*  6 |       SORT ORDER BY STOPKEY        |                    |     1 |    28 |     3  (34)| 00:00:01 |
|*  7 |        MAT_VIEW REWRITE ACCESS FULL| MV_AUTO_TASKS      |     1 |    28 |     2   (0)| 00:00:01 |
|*  8 |    INDEX UNIQUE SCAN               | D_1F024BE98000018C |     1 |    60 |     0   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(ROWNUM<=90)
   6 - filter(ROWNUM<=90)
   7 - filter("MV_AUTO_TASKS"."R_NEXT_RETRY_DATE"<SYSDATE@!)
   8 - access("R_OBJECT_ID"="WIS2"."R_OBJECT_ID")

Unfortunately, due to CURSOR_SHARING=FORCE recommendation neither option is applicable, and the only “option” is use hex editor to modify documentum binary – in case of oracle we need to place CURSOR_SHARING_EXACT hint and modify where clause.

Why CURSOR_SHARING=FORCE sucks

As you might have guessed, my colleagues involved my in load testing activities. Actually, there is nothing challenging in writing load tests (I do think that analysis is another topic), but forcing Documentum to work under load is a big challenge. On first iteration we got following docbase sessions graph:

Which is actually weird, because I do suppose that collected metrics should somehow reflect load, but in our scenario we put ~ constant load, but docbase sessions metric not seemed to be a constant 😦 , which could mean one of the following:

  • I’m an idiot and my estimations were wrong, but in this case it means that customer should not use Documentum due to unpredictable performance
  • Something was wrong on Documentum side

Let’s check database side …

Recommendation 1: SQL Tuning
Estimated benefit is 22.41 active sessions, 56.8% of total activity.
--------------------------------------------------------------------
Action
   Investigate the SELECT statement with SQL_ID "34h8xv6j5mx04" for 
   possible performance improvements. You can supplement the information 
   given here with an ASH report for this SQL_ID.
   Related Object
      SQL statement with SQL_ID 34h8xv6j5mx04.
      select gs.group_name, gs.is_dynamic, 
      gs.is_dynamic_default,gs.is_protected, gs.is_module_only from 
      dm_group_s gs, dm_group_r gr where gs.r_object_id = gr.r_object_id 
      and gs.is_dynamic = :"SYS_B_0" and gs.is_dynamic_default = :"SYS_B_1" 
      and (gr.users_names = :P0 or gr.groups_names = :"SYS_B_2" or 
      gr.groups_names in (select gr1.i_nondyn_supergroups_names from 
      dm_group_r gr1, dm_group_r gr2 where gr1.r_object_id = 
      gr2.r_object_id and (gr2.users_names = :P1 or gr2.groups_names = 
      :"SYS_B_3") and gr1.i_nondyn_supergroups_names IS NOT NULL))


SQL ID: 34h8xv6j5mx04                     DB/Inst: ECM/ECM      Snaps: 13-14
-> 1st Capture and Last Capture Snap IDs
   refer to Snapshot IDs witin the snapshot range
-> select gs.group_name, gs.is_dynamic, gs.is_dynamic_default,gs.is_prote...

    Plan Hash           Total Elapsed                 1st Capture   Last Capture
#   Value                    Time(ms)    Executions       Snap ID        Snap ID
--- ---------------- ---------------- ------------- ------------- --------------
1   2451896125             85,924,271       16,7191            14             14
          -------------------------------------------------------------       

Execution Plan
------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name               | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                    |       |       | 15936 (100)|          |       |      
|   1 |  FILTER                             |                    |       |       |            |          |       |      
|   2 |   PX COORDINATOR                    |                    |       |       |            |          |       |      
|   3 |    PX SEND QC (RANDOM)              | :TQ10001           |  9328K|   676M| 15936   (2)| 00:03:12 |       |      
|   4 |     HASH JOIN                       |                    |  9328K|   676M| 15936   (2)| 00:03:12 |       |      
|   5 |      BUFFER SORT                    |                    |       |       |            |          |       |      
|   6 |       PART JOIN FILTER CREATE       | :BF0000            |   386K|    19M|  1859   (1)| 00:00:23 |       |      
|   7 |        PX RECEIVE                   |                    |   386K|    19M|  1859   (1)| 00:00:23 |       |      
|   8 |         PX SEND PARTITION (KEY)     | :TQ10000           |   386K|    19M|  1859   (1)| 00:00:23 |       |      
|   9 |          TABLE ACCESS BY INDEX ROWID| DM_GROUP_S         |   386K|    19M|  1859   (1)| 00:00:23 |       |      
|  10 |           INDEX RANGE SCAN          | D_XXXXXXXX8000010A |   773K|       |   124   (1)| 00:00:02 |       |      
|  11 |      PX PARTITION HASH JOIN-FILTER  |                    |    18M|   407M| 14035   (2)| 00:02:49 |:BF0000|:BF000
|  12 |       TABLE ACCESS FULL             | DM_GROUP_R         |    18M|   407M| 14035   (2)| 00:02:49 |:BF0000|:BF000
|  13 |   NESTED LOOPS                      |                    |     4 |   228 |     3   (0)| 00:00:01 |       |      
|  14 |    NESTED LOOPS                     |                    |    88 |   228 |     3   (0)| 00:00:01 |       |      
|  15 |     PARTITION HASH ALL              |                    |     4 |   136 |     2   (0)| 00:00:01 |     1 |    20
|  16 |      INDEX RANGE SCAN               | D_XXXXXXXX80000056 |     4 |   136 |     2   (0)| 00:00:01 |     1 |    20
|  17 |     PARTITION HASH ITERATOR         |                    |    22 |       |     1   (0)| 00:00:01 |   KEY |   KEY
|  18 |      INDEX RANGE SCAN               | D_XXXXXXXX80000109 |    22 |       |     1   (0)| 00:00:01 |   KEY |   KEY
|  19 |    TABLE ACCESS BY LOCAL INDEX ROWID| DM_GROUP_R         |     1 |    23 |     1   (0)| 00:00:01 |     1 |     1
------------------------------------------------------------------------------------------------------------------------
 
Full SQL Text

SQL ID       SQL Text                                                         
------------ -----------------------------------------------------------------
34h8xv6j5mx0 select gs.group_name, gs.is_dynamic, gs.is_dynamic_default, gs.is
             _protected, gs.is_module_only from dm_group_s gs, dm_group_r gr w
             here gs.r_object_id = gr.r_object_id and gs.is_dynamic = :"SYS_B_
             0" and gs.is_dynamic_default = :"SYS_B_1" and (gr.users_names = :
             P0 or gr.groups_names = :"SYS_B_2" or gr.groups_names in (select 
             gr1.i_nondyn_supergroups_names from dm_group_r gr1, dm_group_r gr
             2 where gr1.r_object_id = gr2.r_object_id and (gr2.users_names = 
             :P1 or gr2.groups_names = :"SYS_B_3") and gr1.i_nondyn_supergroup
             s_names IS NOT NULL))

That is awesome! Single SQL query consumes 60% of database resources! Before trying to optimize this query we need to figure out how it really looks – how can we understand that we optimized some query if we don’t know how to execute it? And here we are:

SELECT gs.group_name,
       gs.is_dynamic,
       gs.is_dynamic_default,
       gs.is_protected,
       gs.is_module_only
  FROM dm_group_s gs, dm_group_r gr
 WHERE     gs.r_object_id = gr.r_object_id
       AND gs.is_dynamic = 1
       AND gs.is_dynamic_default = 1
       AND (   gr.users_names = :P0
            OR gr.groups_names = 'dm_world'
            OR gr.groups_names IN (
          SELECT gr1.i_nondyn_supergroups_names
            FROM dm_group_r gr1, dm_group_r gr2
           WHERE     gr1.r_object_id = gr2.r_object_id
                     AND (   gr2.users_names = :P1
                             OR gr2.groups_names = 'dm_world')
                     AND gr1.i_nondyn_supergroups_names IS NOT NULL)
       );

This query is intended to return dynamic groups (gs.is_dynamic = 1 AND gs.is_dynamic_default = 1) enabled by default, now guess how many dynamic groups enabled by default does our docbase have? Zero! 60% of database resources to return empty resultset! But due to CURSOR_SHARING=FORCE Oracle do not understand that we ask him to return empty resultset and, so, it selects “suboptimal” execution plan, though the optimal is following:

--------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name               | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                    |     1 |    77 |     2   (0)| 00:00:01 |       |       |
|*  1 |  FILTER                             |                    |       |       |            |          |       |       |
|   2 |   NESTED LOOPS                      |                    |     1 |    77 |     2   (0)| 00:00:01 |       |       |
|   3 |    NESTED LOOPS                     |                    |    25 |    77 |     2   (0)| 00:00:01 |       |       |
|*  4 |     TABLE ACCESS BY INDEX ROWID     | DM_GROUP_S         |     1 |    53 |     1   (0)| 00:00:01 |       |       |
|*  5 |      INDEX RANGE SCAN               | D_XXXXXXXX8000010A |     1 |       |     1   (0)| 00:00:01 |       |       |
|   6 |     PARTITION HASH ITERATOR         |                    |    25 |       |     1   (0)| 00:00:01 |   KEY |   KEY |
|*  7 |      INDEX RANGE SCAN               | D_XXXXXXXX80000109 |    25 |       |     1   (0)| 00:00:01 |   KEY |   KEY |
|   8 |    TABLE ACCESS BY LOCAL INDEX ROWID| DM_GROUP_R         |    25 |   600 |     1   (0)| 00:00:01 |     1 |     1 |
|   9 |   NESTED LOOPS                      |                    |     2 |   114 |     3   (0)| 00:00:01 |       |       |
|  10 |    NESTED LOOPS                     |                    |    26 |   114 |     3   (0)| 00:00:01 |       |       |
|  11 |     PARTITION HASH ALL              |                    |    15 |   495 |     2   (0)| 00:00:01 |     1 |    20 |
|* 12 |      INDEX RANGE SCAN               | D_XXXXXXXX80000056 |    15 |   495 |     2   (0)| 00:00:01 |     1 |    20 |
|  13 |     PARTITION HASH ITERATOR         |                    |    13 |       |     1   (0)| 00:00:01 |   KEY |   KEY |
|* 14 |      INDEX RANGE SCAN               | D_XXXXXXXX80000109 |    13 |       |     1   (0)| 00:00:01 |   KEY |   KEY |
|* 15 |    TABLE ACCESS BY LOCAL INDEX ROWID| DM_GROUP_R         |     1 |    24 |     1   (0)| 00:00:01 |     1 |     1 |
--------------------------------------------------------------------------------------------------------------------------

So, we stabilised execution plan for this query and on next iteration we got following docbase sessions graph:

Good news: I’m not an idiot.

Database connections. Part II

Well, previosely we defined an estimation for database connection – about twice amount of concurrent Documentum sessions – may be less, may be more, depends on application. Now the question: how many connections is it possible to create in database? OpenText thinks that the number of database connections is unlimited:

D2 runs on content server which has good scalability by adding additional content server nodes. Content server is often the bottleneck of the whole D2 topology when the system is running under a heavy load condition. When content server reaches its bottleneck, we could monitored the CPU usage of content server is up to 90% and the number of active sessions grows very fast. To add one additional content server node to the existing environment could improve system throughput significantly.
Officially we suggests adding one additional content server node on every 300 concurrent users’ growth. The mileage

which is actually not true, on the other hand if OpenText has written something like: our product fails to take advantage of best practices and does not pool database connections, it would be ridiculous, so instead of improving product they has preferred to declare another marketing bullshit.

So, why database connection pooling is important? If you try to ask google you will find something like: creating database connection is an expensive and time-consuming operation: application needs to perform TCP (or even TLS) handshake, authenticate, database needs to start new process, etc…, so, it is recommended to pool database connections. Unfortunately it is only a half of the truth – pools also limit the number of concurrent database connections, and it is important too, let me quote the best oracle database expert ever:

In looking at your Automatic Workload Repository report, I see that the longest-running events at the system level are latch-related: cache buffers chains and library cache. Additionally, your CPU time was way up there. Concurrency-based waits are caused by one thing and one thing only: having many concurrently active sessions. If you had fewer concurrently active sessions, you would by definition have fewer concurrency-based waits (fewer people contending with each other). I see that you had 134 sessions in the database running on a total of 4 CPU cores. Because it is not really possible for 4 CPU cores to allow 134 sessions to be concurrently active, I recommend that you decrease the number of concurrent sessions by reducing the size of your connection pool—radically. Cut your connection pool to something like 32. It is just not possible for 4 cores to run 134 things at the same time; 4 cores can run only 4 things at exactly the same time. If you reduce the concurrent user load, you’ll see those waits go down, response time go down, and your transaction rates improve. In short, you’ll find out that you can actually do more work with fewer sessions than you can with more.

I know that this fewer-does-more suggestion sounds counterintuitive, but I encourage you to watch this narrated Real World Performance video.

In this video, you’ll see what happens in a test of a 12-core machine running transactions and decreasing the size of a connection pool from an unreasonable number (in the thousands) to a reasonable number: 96. At 96 connections on this particular machine, the machine was able to do 150 percent the number of transactions per second and took the response time for these transactions from ~100 milliseconds down to ~5 milliseconds.

Short of reducing your connection pool size (and therefore changing the way the application is using the database by queuing in the middle-tier connection pool instead of queuing hundreds of active sessions in the database), you would have to change your queries to make them request cache buffers chains latches less often. In short: tune the queries and the algorithms in the application. There is literally no magic here. Tweaking things at the system level might not be an option. Touching the application might have to be an option.

And from Documentum perspective, the only option to limit the number of database connections is to use shared server feature (fuck yeah, Oracle supports Database Resident Connection Pooling since 11g, but mature product does not). And do not pay much attention to EMC’s documents like Optimizing Oracle for EMC Documentum – such documents are wrong from beginning to end.

Database connections

Let’s discuss another statement from performance guide (forget about the fact that the writer experiences difficulties with math (I do believe that 2x2x250 is 1000 but not 2000)):

Tune PROCESSES: As a rule of thumb, PROCESSES should be set to the double of the sum of maximum concurrent sessions of all content server instances. For example, if there are two instances each with concurrent_sessions set to 250, PROCESSES should be set to 2000 (2x2x250). It needs to specify the scope in the alter command when change the value.

Well, the question is: why does docbase session require two database connections? Actually, it is worth writing a continuation for “r_object_id. Type tag” blogpost and describe how Content Server (actually DFC) generates identifiers, but the common idea is following: when DFC creates new object (i.e. when we call IDfSession#newObject method) it sends NEXT_ID or NEXT_ID_LIST RPC command to Content Server:

API> apply,c,,NEXT_ID_LIST,TAG,I,8,HOW_MANY,I,10
...
q0
API> ?,c,q0
next_id         
----------------
08024be98005fe92
08024be98005fe93
08024be98005fe94
08024be98005fe95
08024be98005fe96
08024be98005fe97
08024be98005fe98
08024be98005fe99
08024be98005fe9a
08024be98005fe9b
(1 row affected)

and Content Server modifies data stored in dmi_sequence_s table. Now imagine that we are creating tons of objects in parallel and, moreover, we are taking advantage of using transactions, how do NEXT_ID/NEXT_ID_LIST commands work in this case? If CS modifies data in dmi_sequence_s table in transaction, corresponding rows become locked and another transaction unable to modify the same data, so, to prevent such concurrency issue when performing NEXT_ID/NEXT_ID_LIST commands Content Server creates new (temporary) database connection and never releases it:

API> ?,c,execute EXEC_SELECT_SQL with QUERY='SELECT COUNT(*) cnt 
     FROM v$session WHERE USERNAME=SYS_CONTEXT(''USERENV'',''SESSION_USER'')'
cnt                   
----------------------
                    14
(1 row affected)

API> apply,c,,NEXT_ID_LIST,TAG,I,8,HOW_MANY,I,1000
...
q0
API> ?,c,execute EXEC_SELECT_SQL with QUERY='SELECT COUNT(*) cnt 
     FROM v$session WHERE USERNAME=SYS_CONTEXT(''USERENV'',''SESSION_USER'')'
cnt                   
----------------------
                    15
(1 row affected)

you may observe the reference to this temporary database connection in output of SHOW_SESSIONS command (note the value of tempdb_session_ids, also note that, in general, tempdb_session_ids in SHOW_SESSIONS output displays wrong data):

USER ATTRIBUTES

  root_start                      : 5/10/2017 09:17:05
  root_pid                        : 2502
  shared_mem_id                   : 7667716
  semaphore_id                    : 327685
  session                      [0]: 01024be9800060d7
  db_session_id                [0]: 341
  typelockdb_session_id        [0]: -1
  tempdb_session_ids           [0]: 328
  pid                          [0]: 21067
  user_name                    [0]: dmadmin
  user_authentication          [0]: Trusted Client
  client_host                  [0]: docu72dev01
  client_lib_ver               [0]: 7.2.0030.0072
  client_locale                [0]: (Linux :(8201), Version: Linux), CharSet: UTF-8, Language: English_US, UTC Offset: 14400, Date Form
  start                        [0]: 5/10/2017 13:33:16
  last_used                    [0]: 5/10/2017 13:36:52
  session_status               [0]: Active
  shutdown_flag                [0]: none
  last_rpc                     [0]: EXEC, EXEC, NEXT_ID_LIST, EXEC, EXEC, TIME, FETCH_TYPE, GET_SERVER_CONFI, GET_ERRORS, AUTHENTICATE_USE, ENTRY_POINTS,
  current_rpc                  [0]: SHOW_SESSIONS
  last_completed_rpc           [0]: 5/10/2017 13:34:39
  transaction_status           [0]: Off
  dormancy_status              [0]: Active

So, in long term perspective (actually, both CS and DFC tries to pool “available” identifiers) every docbase session tends to have at least two associated database connections – that is a starting point for database capacity planning, but the question is whether it is an upper bound or not, i.e if we follow “best practices” and set the value of PROCESSES parameter to the double of the sum of maximum concurrent sessions of all content server instances will it prevent us from getting “ORA-00020: maximum number of processes (%s) exceeded” errors or not? And, unfortunately, the answer is: no, it won’t:

API> execquery,c,F,select * from dm_user
...
OK
API> execquery,c,F,select * from dm_user
...
OK
API> execquery,c,F,select * from dm_user
...
OK
API> execquery,c,F,select * from dm_user
...
OK
API> execquery,c,F,select * from dm_user
...
OK
API> apply,c,,SHOW_SESSIONS
...
q7
API> next,c,q7
...
OK
API> dump,c,q7
...
USER ATTRIBUTES

  root_start                      : 5/10/2017 09:17:05
  root_pid                        : 2502
  shared_mem_id                   : 7667716
  semaphore_id                    : 327685
  session                      [0]: 01024be9800060d7
  db_session_id                [0]: 341
  typelockdb_session_id        [0]: -1
  tempdb_session_ids           [0]: 328:74:352:281:270:141
  pid                          [0]: 21787
  user_name                    [0]: dmadmin
  user_authentication          [0]: Trusted Client
  client_host                  [0]: docu72dev01
  client_lib_ver               [0]: 7.2.0030.0072
  client_locale                [0]: (Linux :(8201), Version: Linux), CharSet: UTF-8, Language: English_US, UTC Offset: 14400, Date Form
  start                        [0]: 5/10/2017 13:45:30
  last_used                    [0]: 5/10/2017 13:45:37
  session_status               [0]: Active
  shutdown_flag                [0]: none
  last_rpc                     [0]: EXEC, EXEC, EXEC, EXEC, EXEC, GET_ERRORS, AUTHENTICATE_USE, ENTRY_POINTS, 
  current_rpc                  [0]: SHOW_SESSIONS
  last_completed_rpc           [0]: 5/10/2017 13:45:34
  transaction_status           [0]: Off
  dormancy_status              [0]: Active

dfc.query.should_include_object_name

Have never thought that my colleagues may teach me something…

Yesterday I asked my colleague, who is trying to improve his skills in performance optimisation, whether he had any idea how to improve this SQL statement:

SELECT ALL dm_folder.r_object_id
  FROM dm_folder_sp dm_folder
 WHERE     (    EXISTS
                   (SELECT r_object_id
                      FROM dm_folder_r
                     WHERE     dm_folder.r_object_id = r_object_id
                           AND r_folder_path = :"SYS_B_00")
            AND (dm_folder.object_name = :"SYS_B_01"))
       AND (    dm_folder.i_has_folder = :"SYS_B_02"
            AND dm_folder.i_is_deleted = :"SYS_B_03")

and, surprisingly, the answer was: “Yes, I have seen something similar on support site – EMC suggest to set dfc.query.should_include_object_name and dfc.query.should_include_object_name properties, something like:

dfc.query.object_name_for_docbase[0]=<docbase_name>
dfc.query.should_include_object_name[0]=false


Well, as was expected both dfc.query.should_include_object_name and dfc.query.should_include_object_name properties are not documented, so let discuss the problem more thoroughly.

Imagine that we are maintaining following folder structure in our docbase:

\_CLIENT_1
  \_CLAIMS
  \_INVOCES
\_CLINET_2
  \_CLAIMS
  \_INVOCES
...
\_CLIENT_X
  \_CLAIMS
  \_INVOCES

i.e. for every client we create the same folder structure and when we want to store invoice for particular client we do something like:

create,c,dm_document
set,c,l,object_name
xxx
link,c,/CLIENTS/CLINET_1/INVOICES
save,c,l

the problem is that upon link call DFC calls IDfSession#getFolderByPath method to retrieve folder object with particular path, and inside IDfSession#getFolderByPath method DFC does following: it cuts off object name part from the path (i.e. everything after last ‘/’) and builds following DQL query:

SELECT r_object_id FROM dm_folder 
WHERE object_name='INVOICES' 
 AND ANY r_folder_path='/CLIENTS/CLINET_1/INVOICES'

such implementation is bit weird for two reasons:

  • when I do the same I just write something like “retrieve,c,dm_folder where any r_folder_path=”…” and do not bother myself about object name
  • Content Server has a build-in FolderIdFindByPath RPC command:
    API> apply,c,,FolderIdFindByPath,_FOLDER_PATH_,S,/dmadmin
    ...
    q0
    API> next,c,q0
    ...
    OK
    API> get,c,q0,result
    ...
    0c01d92080000105
    API> close,c,q0
    ...
    OK

    which generates following effective SQL statement:

    select r_object_id from dm_folder_r where r_folder_path = :p0

so, I have no idea why DFC performs extra logic here, moreover, in case of current DFC implementation we are getting overcomplicated SQL query and, sometimes database engine fails to build a good execution plan for this query (this is caused by dumb recommendation to set CURSOR_SHARING database parameter to FORCE and depending on docbase structure execution of such query may take minutes). Below are two possible execution plans for this query:

good (dm_folder_r is a leading table – querying dm_folder_r table by r_folder_path will always return not more than one row):

and bad (dm_folder_r is not a leading table – imagine that we have 1 million clients and hence 1 million INVOICE folders, so querying dm_sysobjec_s table by object_name first will return 1 million records):

in case of “retrieve,c,dm_folder where any r_folder_path=”…” execution plan is always good:

In 2011 (if my memory serves me right), I solved such performance problem by marking index on dm_folder_r(r_folder_path) as unique – in this case database engine always builds the correct execution plan because it knows that querying dm_folder_r table will always return not more than one row, however in recent versions DFC it is possible to disable it’s dumb behaviour by setting dfc.query.should_include_object_name and dfc.query.should_include_object_name properties – can’t understand why this wasn’t enabled by default.

anti-performance series

It is been already a year since I had started nurturing an idea how to write a blogpost about performance best practices, unfortunately, such idea was initially doomed to failure – there are a lot of materials that need to be verified before posting, and no doubt it should take a lot of time, so I “invented” another format: I will try to prove or ruin statements from performance guides provided by talented team.

Actually, some performance-related statements were already ruined in previous posts:

Minimizing and consolidating activities
System throughput varies between 3-100 activities per second, depending on system configuration and hardware. Workflows with more activities take longer to complete. The largest performance impact for processing activities results from opening up a new Content Server session. As a result, the biggest performance improvement comes from minimizing the number of discrete activities in a workflow. Minimize the number of workflow activities by, 1) eliminating unnecessary activities altogether or 2) consolidating the steps performed by multiple activities, into a single condensed activity.
To improve the completion rate of individual activities, do the following:

  • Use the bpm_noop template wherever possible. This particular noop does not create an additional template and does not send an HTTP post to the JMS
  • Within the automatic activity, do the work on behalf of a superuser instead of a regular user
  • Turn off auditing whenever unnecessary

Iteratively modify the number of system workflow threads to assess the impact on user response time, activity throughput, and system resource consumption. More workflow threads result in greater automatic activity throughput up to the point where system resource consumption degrades performance. Scale up slowly to understand when resource limitations begin to show (Content Server CPU and database CPU utilization). The following provides some guidelines:

  • A single CPU Content Server host cannot process 10,000 activities per hour, regardless of how it is configured
  • Be cautious if CPU or memory utilization exceeds 80% for any tier in the system
  • Do not configure more than three threads per CPU core

If throughput requirements exceed the capacity that a single Content Server can provide, add more Content Servers. Each Content Server instance (and associated workflow agent) nominally supports 15 concurrent workflow threads. Deploy one Content Server instance for every multiple of 15 concurrent workflow threads required by your solution. Avoid more than 25 workflow threads for any Content Server.

In general the statements above are misleading:

  • I doubt that “The largest performance impact for processing activities results from opening up a new Content Server session”: at first, JMS do not open new sessions – all sessions are already in session pool, bad thing here is DFC performs authentication when it acquires session from session pool – CS generates new ticket for every auto-activity and these tickets never match passwords associated with pooled sessions and, if my memory severs me right, such reauthentication takes 2 RPCs, at second, dealing with workitem typically takes 4 RPCs: begin transaction, acquire, complete, commit (+ content server does some extra job: creating next activity, updating workflow object, etc) + we need to do some useful work (i.e. perform business logic)
  • workflow delays, caused by processing of auto-activities, does not affect business users: business users are not robots, they do not complete tasks as quick as thought – a couple of extra minutes won’t make sense. On the other hand “consolidating” auto-activities has a negative impact on a project complexity: you need to either consolidate both code and docbase methods or create an extra layer, purposed to implement such consolidation (actually, we use the second option, but that wasn’t influenced by performance considerations), so, it is much better to keep code simple in spite of EMC’s idea about consolidations sounds reasonable
  • I have no idea what were prerequisites to suggest invoking auto-activities under superuser account (I would accept the following scenario: all auto-activities are invoked under installation owner account and CS/JMS takes advantage of trusted authentication, but workflow agent does not support such option), but my preferred option is to assign “previous activity performer” as performer of auto-activity and take advantage of dynamic groups – such approach allows to keep track of last performer of manual activities – business users are able to see who have sent them a task
  • “10,000 auto-activities per hour for single CPU host” is extremely pessimistic estimation – 30,000-50,000 is more close to reality on modern hardware
  • there is no scientific explanation why we need to limit the number of workflow agents by 25 (extra licence fees?) – I do believe that “2 * number of cores” is a good starting point for any hardware configuration

JMS performance slowdown

About two months ago I had noticed a weird behaviour of JMS: after a while (the shortest period I had noticed was 2 days) the code related to “mathematical computations” starts executing extremely slow – if previously execution of certain block of code normally took about 10ms, after a while it takes about 5 minutes, such behaviour also accompanies by the following observation: JVM does not utilise all available cores, i.e. I see multiple (>10) “computation” threads executing the same block of code, but JVM does not consume more than 2 CPU cores.

I tried to eliminate all possible root causes (entropy, GC, RSA libraries), but nothing helped, today I have discovered following topics which look extremely similar to my performance issue:

And indeed, there is a difference between jboss 7.1 and 7.2 in standalone.sh:

@@ -102,11 +146,6 @@
         if [ "x$NO_COMPRESSED_OOPS" = "x" ]; then
             "$JAVA" $JVM_OPTVERSION -server -XX:+UseCompressedOops -version >/dev/null 2>&1 && PREPEND_JAVA_OPTS="$PREPEND_JAVA_OPTS -XX:+UseCompressedOops"
         fi
-
-        NO_TIERED_COMPILATION=`echo $JAVA_OPTS | $GREP "\-XX:\-TieredCompilation"`
-        if [ "x$NO_TIERED_COMPILATION" = "x" ]; then
-            "$JAVA" $JVM_OPTVERSION -server -XX:+TieredCompilation -version >/dev/null 2>&1 && PREPEND_JAVA_OPTS="$PREPEND_JAVA_OPTS -XX:+TieredCompilation"
-        fi
     fi
 
     JAVA_OPTS="$PREPEND_JAVA_OPTS $JAVA_OPTS"

I have tried to disable TieredCompilation option and now I continue to monitor JMS behaviour 🙂

JMS high availability feature

The topic of this blogpost seems to be one of the most useless things in Documentum – I already mentioned that in What is wrong in Documentum. Part I, but it seems that talented team is still trying to “improve” this “feature”:

so, I think this topic deserves more thorough explanation.

Well, why do we need JMS? Frankly speaking, ten years ago I was bullshitting customers by describing JMS like: “It is an embedded application server, purposed to execute custom business logic” – means nothing, but sounds good 🙂 In real life JMS is a single point of failure which causes 80% of issues. So, if it is so unstable and affects business-users we need to stabilise it somehow – and there is a big difference between talented team and me: when I see phrase single point of failure I start thinking how to eliminate/avoid “point of failure”, when talented team see the same phrase they think about how to eliminate “single” (I have no idea why, m.b. it is just hard to read the whole phrase).

Quick question: JMS in Documentum 7.3 supports both load-balancing and failover (this was highlighted as a key enhancement) – do we really need load-balancing capabilities? How to answer to this question? Put load on Content Server and measure what resources are consumed by components, typically, I get something like:

So, Content Server consumes five times more resources than JMS, do I need to load balance JMS? Definitely not! I need to load balance Content Serves.

Now about failover capabilities. Let’s list everything which is typically executed on JMS:

  • workflow methods – I can’t understand why workflow agent is still a part of Content Server if JMS can do the same, moreover, there is no reason to failover workflow methods – slight delays in workflows, caused by JMS failures, do not affect business-users – you just need to monitor JMS health
  • job methods – the same considerations as above (dm_agent_exec is a piece of shit)
  • lifecycle methods – looks like a design gap: I can’t understand why lifecycle transitions are implemented as docbase methods: dm_bp_transition_java method just reads information about lifecycle states and executes corresponding SBOs – why do not do the same on TBO side?
  • custom docbase methods – you are a bloody idiot if custom docbase methods are a part of your application
  • mail methods – looks like a design gap too: it would be more convenient to add a special flag to dmi_queue_item objects which would indicate whether the corresponding email has been sent or not

Did I miss something? Yes, in Documentum 7.3 talented team implemented SAML authentication through JMS – taking opensource C++ implementation was too difficult.

Workflow throughput

Before doing something massive (updates, loads, migrations, etc) it is a good idea to perform some measurements in order to understand whether your strategy is correct or not, let me provide some examples.

About year ago my colleagues were trying to change a type of some set of documents (i.e. they just wrote a “CHANGE OBJECT” DQL query), that change worked smoothly on DEV and TEST environments, but when they tried to apply it on a STAGE environment it took 5 days 🙂 What was wrong? Obviously, my colleagues didn’t have knowledge about throughput of “CHANGE OBJECT” DQL query, actually, I still do not have such knowledge too, because for me it is not interesting, but I believe it’s throughput is comparable with object creation. That time the performance puzzle was solved by DevOps engineers: they had created a large IAPI scenario as was described in DQL update statement. Behind the scenes. blogpost and had taken advantage of my bash multiplexer 🙂

Second case: two years ago I was participating in project aimed to decommission old archiving system and replace it by Documentum, one of the plenty steps in run list was to migrate about 500 million documents from old system to Documentum (here I’m not sure why EMC didn’t suggest to use InfoArchive instead of Documentum – that project was a kind of reference project in AU), developers wrote a migration tool which had throughput about 20 documents per second, and that was a fail – 300 days ETA sounds ridiculous 🙂

Third case: bulk fetches

Now about blogpost topic: as I described previosely, workflow engine in Documentum is not well-balanced, so, in order to stop DDoSing JMS by e-mail notifications I introduced a noop method, which creates a synthetic queue. What is the performance impact of such implementation (i.e. what is the throughput of workflow agent when performing noop auto-activities)? The xCP1.6 Performance Tuning Guide, I advertised in previous blogpost states following:

My measurements are following:

  • Single workflow agent is able to perform about 30000 trivial (noop) auto-activities per hour
  • Single workflow agent is able to perform about 15000 non-trivial auto-activities per hour (actually, it depends on auto-activity – I just added a couple of interactions with CS)
  • It is a bad idea to configure more than 2 workflow agents per CPU core – the “idea of three services, which work sequentially” is wrong because both CS and JMS reside on the same host and consume more that 40% of CPU time:

DFS client performance

All that time I did think that it is not possible to impress me if we are dealing with performance in Documentum, but on the last week my colleagues did that 🙂 Their problem was following: my colleagues had implemented integration service which heavily uses Documentum Foundation Services and on the last week they have started experiencing difficulties with performance, from java side their observations were looking like:

"Thread-1739" daemon prio=10 tid=0x00002aaac0f09000 nid=0x34bc runnable [0x000000004487f000]
   java.lang.Thread.State: RUNNABLE
	at org.jboss.modules.ModuleClassLoader.loadResourceLocal(ModuleClassLoader.java:305)
	at org.jboss.modules.ModuleClassLoader$1.loadResourceLocal(ModuleClassLoader.java:84)
	at org.jboss.modules.Module.getResources(Module.java:592)
	at org.jboss.modules.ModuleClassLoader.findResources(ModuleClassLoader.java:489)
	at org.jboss.modules.ConcurrentClassLoader.getResources(ConcurrentClassLoader.java:288)
	at java.lang.ClassLoader.getResources(ClassLoader.java:1181)
	at com.documentum.fc.client.impl.bof.classmgmt.DelayedDelegationClassLoader.getResources(DelayedDelegationClassLoader.java:103)
	at com.sun.xml.ws.util.ServiceFinder$LazyIterator.hasNext(ServiceFinder.java:351)
	at com.sun.xml.ws.util.ServiceFinder.toArray(ServiceFinder.java:225)
	at com.sun.xml.ws.client.WSServiceDelegate.parseWSDL(WSServiceDelegate.java:265)
	at com.sun.xml.ws.client.WSServiceDelegate.<init>(WSServiceDelegate.java:228)
	at com.sun.xml.ws.client.WSServiceDelegate.<init>(WSServiceDelegate.java:176)
	at com.sun.xml.ws.spi.ProviderImpl.createServiceDelegate(ProviderImpl.java:104)
	at javax.xml.ws.Service.<init>(Service.java:77)
	at com.emc.documentum.fs.services.core.ObjectService.<init>(ObjectService.java:68)
	at sun.reflect.GeneratedConstructorAccessor530.newInstance(Unknown Source)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
	at com.emc.documentum.fs.rt.context.impl.DfsSoapService.newWebServiceClient(DfsSoapService.java:176)
	- locked <0x0000000709a71eb0> (a java.lang.Class for com.emc.documentum.fs.services.core.ObjectService)
	at com.emc.documentum.fs.rt.context.impl.DfsSoapService.init(DfsSoapService.java:69)
	at com.emc.documentum.fs.rt.context.impl.DfsSoapService.getBindingProvider(DfsSoapService.java:62)
	at com.emc.documentum.fs.rt.context.impl.HttpSessionInvocationHandler.invoke(HttpSessionInvocationHandler.java:54)
	at com.emc.documentum.fs.rt.context.impl.RemoteServiceInterfaceInvocationHandler.invoke(RemoteServiceInterfaceInvocationHandler.java:30)
	at com.emc.documentum.fs.rt.context.impl.ReturnedContentTransformationHandler.invoke(ReturnedContentTransformationHandler.java:45)
	at com.emc.documentum.fs.rt.context.impl.OperationOptionsHandler.invoke(OperationOptionsHandler.java:74)
	at com.emc.documentum.fs.rt.context.impl.ContextThreadLocalInvocationHandler.invoke(ContextThreadLocalInvocationHandler.java:48)
	at com.emc.documentum.fs.rt.context.impl.ServiceContextInvocationHandler.invoke(ServiceContextInvocationHandler.java:30)
	at com.emc.documentum.fs.rt.context.impl.FileRegistryCleanupHandler.invoke(FileRegistryCleanupHandler.java:24)
	at com.sun.proxy.$Proxy166.get(Unknown Source)

...

"Thread-1737" daemon prio=10 tid=0x00002aaac0aa1000 nid=0x34ba waiting for monitor entry [0x0000000044d84000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.emc.documentum.fs.rt.context.impl.DfsSoapService.newWebServiceClient(DfsSoapService.java:171)
	- waiting to lock <0x0000000709b6e710> (a java.lang.Class for com.emc.documentum.fs.services.core.QueryService)
	at com.emc.documentum.fs.rt.context.impl.DfsSoapService.init(DfsSoapService.java:69)
	at com.emc.documentum.fs.rt.context.impl.DfsSoapService.getBindingProvider(DfsSoapService.java:62)
	at com.emc.documentum.fs.rt.context.impl.HttpSessionInvocationHandler.invoke(HttpSessionInvocationHandler.java:54)
	at com.emc.documentum.fs.rt.context.impl.RemoteServiceInterfaceInvocationHandler.invoke(RemoteServiceInterfaceInvocationHandler.java:30)
	at com.emc.documentum.fs.rt.context.impl.ReturnedContentTransformationHandler.invoke(ReturnedContentTransformationHandler.java:45)
	at com.emc.documentum.fs.rt.context.impl.OperationOptionsHandler.invoke(OperationOptionsHandler.java:74)
	at com.emc.documentum.fs.rt.context.impl.ContextThreadLocalInvocationHandler.invoke(ContextThreadLocalInvocationHandler.java:48)
	at com.emc.documentum.fs.rt.context.impl.ServiceContextInvocationHandler.invoke(ServiceContextInvocationHandler.java:30)
	at com.emc.documentum.fs.rt.context.impl.FileRegistryCleanupHandler.invoke(FileRegistryCleanupHandler.java:24)
	at com.sun.proxy.$Proxy150.execute(Unknown Source)

...

"Thread-1736" daemon prio=10 tid=0x00002aaac141e000 nid=0x34b5 runnable [0x000000004054f000]
   java.lang.Thread.State: RUNNABLE
	at java.util.HashMap.initHashSeedAsNeeded(HashMap.java:337)
	at java.util.HashMap.inflateTable(HashMap.java:317)
	at java.util.HashMap.<init>(HashMap.java:296)
	at java.util.LinkedHashMap.<init>(LinkedHashMap.java:212)
	at sun.reflect.annotation.AnnotationParser.parseAnnotation2(AnnotationParser.java:270)
	at sun.reflect.annotation.AnnotationParser.parseAnnotations2(AnnotationParser.java:117)
	at sun.reflect.annotation.AnnotationParser.parseAnnotations(AnnotationParser.java:70)
	at java.lang.reflect.Field.declaredAnnotations(Field.java:1128)
	- locked <0x00000007b23b6590> (a java.lang.reflect.Field)
	at java.lang.reflect.Field.getDeclaredAnnotations(Field.java:1121)
	at java.lang.reflect.AccessibleObject.getAnnotations(AccessibleObject.java:197)
	at com.sun.xml.bind.v2.model.annotation.RuntimeInlineAnnotationReader.getAllFieldAnnotations(RuntimeInlineAnnotationReader.java:69)
	at com.sun.xml.bind.v2.model.annotation.RuntimeInlineAnnotationReader.getAllFieldAnnotations(RuntimeInlineAnnotationReader.java:53)
	at com.sun.xml.bind.v2.model.impl.ClassInfoImpl.findFieldProperties(ClassInfoImpl.java:371)
	at com.sun.xml.bind.v2.model.impl.ClassInfoImpl.getProperties(ClassInfoImpl.java:301)
	at com.sun.xml.bind.v2.model.impl.RuntimeClassInfoImpl.getProperties(RuntimeClassInfoImpl.java:176)
	at com.sun.xml.bind.v2.model.impl.ModelBuilder.getClassInfo(ModelBuilder.java:243)
	at com.sun.xml.bind.v2.model.impl.RuntimeModelBuilder.getClassInfo(RuntimeModelBuilder.java:100)
	at com.sun.xml.bind.v2.model.impl.RuntimeModelBuilder.getClassInfo(RuntimeModelBuilder.java:81)
	at com.sun.xml.bind.v2.model.impl.ModelBuilder.getClassInfo(ModelBuilder.java:209)
	at com.sun.xml.bind.v2.model.impl.RuntimeModelBuilder.getClassInfo(RuntimeModelBuilder.java:95)
	at com.sun.xml.bind.v2.model.impl.RuntimeModelBuilder.getClassInfo(RuntimeModelBuilder.java:81)
	at com.sun.xml.bind.v2.model.impl.ModelBuilder.getTypeInfo(ModelBuilder.java:315)
	at com.sun.xml.bind.v2.model.impl.ModelBuilder.getClassInfo(ModelBuilder.java:270)
	at com.sun.xml.bind.v2.model.impl.RuntimeModelBuilder.getClassInfo(RuntimeModelBuilder.java:100)
	at com.sun.xml.bind.v2.model.impl.RuntimeModelBuilder.getClassInfo(RuntimeModelBuilder.java:81)
	at com.sun.xml.bind.v2.model.impl.ModelBuilder.getClassInfo(ModelBuilder.java:209)
	at com.sun.xml.bind.v2.model.impl.RuntimeModelBuilder.getClassInfo(RuntimeModelBuilder.java:95)
	at com.sun.xml.bind.v2.model.impl.RuntimeModelBuilder.getClassInfo(RuntimeModelBuilder.java:81)
	at com.sun.xml.bind.v2.model.impl.ModelBuilder.getTypeInfo(ModelBuilder.java:315)
	at com.sun.xml.bind.v2.model.impl.TypeRefImpl.calcRef(TypeRefImpl.java:92)
	at com.sun.xml.bind.v2.model.impl.TypeRefImpl.getTarget(TypeRefImpl.java:69)
	at com.sun.xml.bind.v2.model.impl.RuntimeTypeRefImpl.getTarget(RuntimeTypeRefImpl.java:58)
	at com.sun.xml.bind.v2.model.impl.RuntimeTypeRefImpl.getTarget(RuntimeTypeRefImpl.java:51)
	at com.sun.xml.bind.v2.model.impl.ElementPropertyInfoImpl$1.get(ElementPropertyInfoImpl.java:74)
	at com.sun.xml.bind.v2.model.impl.ElementPropertyInfoImpl$1.get(ElementPropertyInfoImpl.java:77)
	at java.util.AbstractList$Itr.next(AbstractList.java:358)
	at com.sun.xml.bind.v2.model.impl.ModelBuilder.getClassInfo(ModelBuilder.java:255)
	at com.sun.xml.bind.v2.model.impl.RuntimeModelBuilder.getClassInfo(RuntimeModelBuilder.java:100)
	at com.sun.xml.bind.v2.model.impl.RuntimeModelBuilder.getClassInfo(RuntimeModelBuilder.java:81)
	at com.sun.xml.bind.v2.model.impl.ModelBuilder.getClassInfo(ModelBuilder.java:209)
	at com.sun.xml.bind.v2.model.impl.RuntimeModelBuilder.getClassInfo(RuntimeModelBuilder.java:95)
	at com.sun.xml.bind.v2.model.impl.RuntimeModelBuilder.getClassInfo(RuntimeModelBuilder.java:81)
	at com.sun.xml.bind.v2.model.impl.ModelBuilder.getTypeInfo(ModelBuilder.java:315)
	at com.sun.xml.bind.v2.model.impl.RegistryInfoImpl.<init>(RegistryInfoImpl.java:99)
	at com.sun.xml.bind.v2.model.impl.ModelBuilder.addRegistry(ModelBuilder.java:357)
	at com.sun.xml.bind.v2.model.impl.ModelBuilder.getTypeInfo(ModelBuilder.java:327)
	at com.sun.xml.bind.v2.runtime.JAXBContextImpl.getTypeInfoSet(JAXBContextImpl.java:441)
	at com.sun.xml.bind.v2.runtime.JAXBContextImpl.<init>(JAXBContextImpl.java:288)
	at com.sun.xml.bind.v2.runtime.JAXBContextImpl$JAXBContextBuilder.build(JAXBContextImpl.java:1111)
	at com.sun.xml.bind.v2.ContextFactory.createContext(ContextFactory.java:154)
	at com.sun.xml.bind.api.JAXBRIContext.newInstance(JAXBRIContext.java:106)
	at com.sun.xml.ws.developer.JAXBContextFactory$1.createJAXBContext(JAXBContextFactory.java:109)
	at com.sun.xml.ws.model.AbstractSEIModelImpl$1.run(AbstractSEIModelImpl.java:159)
	at com.sun.xml.ws.model.AbstractSEIModelImpl$1.run(AbstractSEIModelImpl.java:152)
	at java.security.AccessController.doPrivileged(Native Method)
	at com.sun.xml.ws.model.AbstractSEIModelImpl.createJAXBContext(AbstractSEIModelImpl.java:151)
	at com.sun.xml.ws.model.AbstractSEIModelImpl.postProcess(AbstractSEIModelImpl.java:94)
	at com.sun.xml.ws.model.RuntimeModeler.buildRuntimeModel(RuntimeModeler.java:255)
	at com.sun.xml.ws.client.WSServiceDelegate.createSEIPortInfo(WSServiceDelegate.java:698)
	at com.sun.xml.ws.client.WSServiceDelegate.addSEI(WSServiceDelegate.java:686)
	at com.sun.xml.ws.client.WSServiceDelegate.getPort(WSServiceDelegate.java:341)
	at com.sun.xml.ws.client.WSServiceDelegate.getPort(WSServiceDelegate.java:324)
	at com.sun.xml.ws.client.WSServiceDelegate.getPort(WSServiceDelegate.java:306)
	at javax.xml.ws.Service.getPort(Service.java:119)
	at com.emc.documentum.fs.services.core.QueryService.getQueryServicePort(QueryService.java:66)
	at sun.reflect.GeneratedMethodAccessor143.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at com.emc.documentum.fs.rt.context.impl.DfsSoapService.newWebServiceClient(DfsSoapService.java:179)
	- locked <0x0000000709b6e710> (a java.lang.Class for com.emc.documentum.fs.services.core.QueryService)
	at com.emc.documentum.fs.rt.context.impl.DfsSoapService.init(DfsSoapService.java:69)
	at com.emc.documentum.fs.rt.context.impl.DfsSoapService.getBindingProvider(DfsSoapService.java:62)
	at com.emc.documentum.fs.rt.context.impl.HttpSessionInvocationHandler.invoke(HttpSessionInvocationHandler.java:54)
	at com.emc.documentum.fs.rt.context.impl.RemoteServiceInterfaceInvocationHandler.invoke(RemoteServiceInterfaceInvocationHandler.java:30)
	at com.emc.documentum.fs.rt.context.impl.ReturnedContentTransformationHandler.invoke(ReturnedContentTransformationHandler.java:45)
	at com.emc.documentum.fs.rt.context.impl.OperationOptionsHandler.invoke(OperationOptionsHandler.java:74)
	at com.emc.documentum.fs.rt.context.impl.ContextThreadLocalInvocationHandler.invoke(ContextThreadLocalInvocationHandler.java:48)
	at com.emc.documentum.fs.rt.context.impl.ServiceContextInvocationHandler.invoke(ServiceContextInvocationHandler.java:30)
	at com.emc.documentum.fs.rt.context.impl.FileRegistryCleanupHandler.invoke(FileRegistryCleanupHandler.java:24)
	at com.sun.proxy.$Proxy150.execute(Unknown Source)

...

"EJB default - 5" prio=10 tid=0x0000000017f20000 nid=0x77a9 waiting for monitor entry [0x0000000043f75000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.emc.documentum.fs.rt.context.impl.DfsSoapService.newWebServiceClient(DfsSoapService.java:171)
	- waiting to lock <0x00000007098434a8> (a java.lang.Class for com.emc.documentum.fs.services.core.QueryService)
	at com.emc.documentum.fs.rt.context.impl.DfsSoapService.init(DfsSoapService.java:69)
	at com.emc.documentum.fs.rt.context.impl.DfsSoapService.getBindingProvider(DfsSoapService.java:62)
	at com.emc.documentum.fs.rt.context.impl.HttpSessionInvocationHandler.invoke(HttpSessionInvocationHandler.java:54)
	at com.emc.documentum.fs.rt.context.impl.RemoteServiceInterfaceInvocationHandler.invoke(RemoteServiceInterfaceInvocationHandler.java:30)
	at com.emc.documentum.fs.rt.context.impl.ReturnedContentTransformationHandler.invoke(ReturnedContentTransformationHandler.java:45)
	at com.emc.documentum.fs.rt.context.impl.OperationOptionsHandler.invoke(OperationOptionsHandler.java:74)
	at com.emc.documentum.fs.rt.context.impl.ContextThreadLocalInvocationHandler.invoke(ContextThreadLocalInvocationHandler.java:48)
	at com.emc.documentum.fs.rt.context.impl.ServiceContextInvocationHandler.invoke(ServiceContextInvocationHandler.java:30)
	at com.emc.documentum.fs.rt.context.impl.FileRegistryCleanupHandler.invoke(FileRegistryCleanupHandler.java:24)
	at com.sun.proxy.$Proxy193.execute(Unknown Source)

Obviously, heavy calls are: com.sun.xml.ws.client.WSServiceDelegate#parseWSDL and com.sun.xml.bind.api.JAXBRIContext#newInstance, plus somebody in EMC made it more slower my placing synchronized keyword into com.emc.documentum.fs.rt.context.impl.DfsSoapService#newWebServiceClient method. Corresponding java code, written by my colleagues, was looking like:

public static QueryResult query(final String query,
        final String repository, final String userName,
        final String password) throws ServiceException {
    IServiceContext context = ContextFactory.getInstance().newContext();
    Identity identity = new RepositoryIdentity(repository, userName,
            password, null);
    context.setIdentities(Arrays.asList(identity));
    IQueryService service = ServiceFactory.getInstance().getRemoteService(
            IQueryService.class, context);
    return service.execute(new PassthroughQuery(Arrays.asList(repository),
            query), null, null);
}

I’m not sure whether the above pattern is optimal or not from performance perspective because documentations says nothing about thread-safety of DFS remote services, but at first glance com.emc.documentum.fs.rt.context.impl.DfsSoapService and com.emc.documentum.fs.rt.context.impl.HttpSessionInvocationHandler classes seem not to be thread-safe, so we can’t share the same instance of IQueryService among multiple threads, moreover, it seems that we can’t reuse instances of DFS remote services as well, so instantiating new instance of IQueryService on every request sounds reasonable, but performance sucks. After some trials and failures I have found following option to improve performance of DFS clients:

package pro.documentum.dfs.remote.cache;

import java.net.URL;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Objects;
import java.util.ServiceLoader;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;

import javax.xml.namespace.QName;
import javax.xml.transform.Source;
import javax.xml.ws.Endpoint;
import javax.xml.ws.EndpointReference;
import javax.xml.ws.Service;
import javax.xml.ws.WebServiceException;
import javax.xml.ws.WebServiceFeature;
import javax.xml.ws.spi.Invoker;
import javax.xml.ws.spi.Provider;
import javax.xml.ws.spi.ServiceDelegate;
import javax.xml.ws.wsaddressing.W3CEndpointReference;

import org.w3c.dom.Element;

/**
 * @author Andrey B. Panfilov <andrey@panfilov.tel>
 */
public class CachingProviderImpl extends Provider {

    private static final ConcurrentMap<DelegateKey, ServiceDelegate> CACHE;

    private static final Provider DELEGATE;

    static {
        CACHE = new ConcurrentHashMap<>();
        DELEGATE = getProviderLoader();
    }

    public CachingProviderImpl() {
        super();
    }

    private static Provider getProviderLoader() {
        ServiceLoader<Provider> loader = ServiceLoader.load(Provider.class);
        Iterator<Provider> iterator = loader.iterator();
        Provider provider = null;
        while (iterator.hasNext()) {
            provider = iterator.next();
            if (isSelf(provider)) {
                continue;
            }
            return provider;
        }
        provider = Provider.provider();
        if (isSelf(provider)) {
            throw new WebServiceException(
                    "Unable to createEndpointReference Provider");
        }
        return provider;
    }

    public static boolean isSelf(final Provider provider) {
        return provider instanceof CachingProviderImpl;
    }

    @Override
    public ServiceDelegate createServiceDelegate(
            final URL wsdlDocumentLocation, final QName serviceName,
            final Class<? extends Service> serviceClass) {
        DelegateKey key = new DelegateKey(wsdlDocumentLocation, serviceName,
                serviceClass);
        ServiceDelegate result = CACHE.get(key);
        if (result == null) {
            result = DELEGATE.createServiceDelegate(wsdlDocumentLocation,
                    serviceName, serviceClass);
            ServiceDelegate temp = CACHE.putIfAbsent(key, result);
            if (temp != null) {
                result = temp;
            }
        }
        return result;
    }

    @SuppressWarnings("unchecked")
    public ServiceDelegate createServiceDelegate(
            final URL wsdlDocumentLocation, final QName serviceName,
            final Class serviceClass, final WebServiceFeature... features) {
        if (features == null || features.length == 0) {
            return createServiceDelegate(wsdlDocumentLocation, serviceName,
                    serviceClass);
        }
        return DELEGATE.createServiceDelegate(wsdlDocumentLocation,
                serviceName, serviceClass, features);
    }

    @Override
    public Endpoint createEndpoint(final String bindingId,
            final Object implementor) {
        return DELEGATE.createEndpoint(bindingId, implementor);
    }

    @Override
    public Endpoint createAndPublishEndpoint(final String address,
            final Object implementor) {
        return DELEGATE.createAndPublishEndpoint(address, implementor);
    }

    @Override
    public EndpointReference readEndpointReference(final Source eprInfoset) {
        return DELEGATE.readEndpointReference(eprInfoset);
    }

    @Override
    public <T> T getPort(final EndpointReference endpointReference,
            final Class<T> serviceEndpointInterface,
            final WebServiceFeature... features) {
        return DELEGATE.getPort(endpointReference, serviceEndpointInterface,
                features);
    }

    @Override
    public W3CEndpointReference createW3CEndpointReference(
            final String address, final QName serviceName,
            final QName portName, final List<Element> metadata,
            final String wsdlDocumentLocation,
            final List<Element> referenceParameters) {
        return DELEGATE.createW3CEndpointReference(address, serviceName,
                portName, metadata, wsdlDocumentLocation, referenceParameters);
    }

    @Override
    public W3CEndpointReference createW3CEndpointReference(
            final String address, final QName interfaceName,
            final QName serviceName, final QName portName,
            final List<Element> metadata, final String wsdlDocumentLocation,
            final List<Element> referenceParameters,
            final List<Element> elements, final Map<QName, String> attributes) {
        return DELEGATE.createW3CEndpointReference(address, interfaceName,
                serviceName, portName, metadata, wsdlDocumentLocation,
                referenceParameters, elements, attributes);
    }

    @Override
    public Endpoint createAndPublishEndpoint(final String address,
            final Object implementor, final WebServiceFeature... features) {
        return DELEGATE
                .createAndPublishEndpoint(address, implementor, features);
    }

    @Override
    public Endpoint createEndpoint(final String bindingId,
            final Object implementor, final WebServiceFeature... features) {
        return DELEGATE.createEndpoint(bindingId, implementor, features);
    }

    @Override
    public Endpoint createEndpoint(final String bindingId,
            final Class<?> implementorClass, final Invoker invoker,
            final WebServiceFeature... features) {
        return DELEGATE.createEndpoint(bindingId, implementorClass, invoker,
                features);
    }

    class DelegateKey {

        private final URL _wsdlLocation;

        private final QName _serviceName;

        private final Class<? extends Service> _serviceClass;

        DelegateKey(final URL wsdlLocation, final QName serviceName,
                final Class<? extends Service> serviceClass) {
            _wsdlLocation = wsdlLocation;
            _serviceName = serviceName;
            _serviceClass = serviceClass;
        }

        public URL getWsdlLocation() {
            return _wsdlLocation;
        }

        public QName getServiceName() {
            return _serviceName;
        }

        public Class<? extends Service> getServiceClass() {
            return _serviceClass;
        }

        @Override
        public boolean equals(final Object o) {
            if (this == o) {
                return true;
            }
            if (!(o instanceof DelegateKey)) {
                return false;
            }
            DelegateKey that = (DelegateKey) o;
            return Objects.equals(_wsdlLocation, that._wsdlLocation)
                    && Objects.equals(_serviceName, that._serviceName)
                    && Objects.equals(_serviceClass, that._serviceClass);
        }

        @Override
        public int hashCode() {
            return Objects.hash(_wsdlLocation, _serviceName, _serviceClass);
        }

    }

}

META-INF/services/javax.xml.ws.spi.Provider:

pro.documentum.dfs.remote.cache.CachingProviderImpl