It’s worth to read about probability density function before continuing reading this post.
One my skypemate is worried about following performance problem: they developed some kind of integration between customer application and EMC Documentum, initially integration was utilizing DFS capabilities, but later, due to performance issues, they switched to pure DFC, further, they had to upgrade to the latest supported DFC version and after that customer has started complaining about performance issues. Though my skypemate associates these complains with changes made in latest versions of DFC (ISM), I’m very skeptical about his suspicions: in my opinion any upgrade is a stress for production environment and any upgrade gives customer a motive to complaining – this is the nature of any customer: it always sounds like a brilliant idea to reveal old issues after some activities have been performed in production environment.
Though I do not share the opinion of my skypemate, his suspicions gave me an idea to write a series of posts about session management in DFC. But before discussing some aspects of DFC implementation we need to perform some basic measurements for typical DFC operations, otherwise we will not able to understand which operations are relatively fast and which ones are slow. To perform these measurements I designed a series of microbenchmarks – there are a dozen of benchmarks but currently we are interested only in four of them:
Main benchmark class (tel.panfilov.documentum.benchmark.Benchmark) spawns a certain number of threads, each spawned thread performs a certain operation in a loop and increments global counter upon successful completion of operation, main thread measures global counter changes at specific time intervals and prints difference with previous value in standard output. Here I assume that all such measurements are independent. Diagrams shown below were created with following parameters:
- number of concurrent threads = 1
- number of measurements = 1000
- sleep time between measurements in milliseconds = 10000
java tel.panfilov.documentum.benchmark.Benchmark ssc_dev dmadmin dmadmin \ > tel.panfilov.documentum.benchmark.impl.Connection 1 1000 Executions per 10000ms: 2, iteration: 1 Executions per 10000ms: 3, iteration: 2 Executions per 10000ms: 4, iteration: 3 Executions per 10000ms: 3, iteration: 4 ... Executions per 10000ms: 3, iteration: 998 Executions per 10000ms: 2, iteration: 999 Executions per 10000ms: 3, iteration: 1000
Connection and authentication benchmarks
Connection benchmark was designed to do not leverage session pooling capabilities because I wanted to understand how slow Documentum stack performs connection handshaking, and the result is extremely poor – 6-8 connections per second:
Because authentication is a part of connection handshaking, I designed Authentication benchmark to understand how slow Documentum performs authentication – the result for user with inline password (unix authentication is 4 times slower, I suppose that LDAP authentication is even more slower) is 30-40 successful authentications per second:
Compare these results with Oracle database (OracleJDBC benchmark):
Why connection and authentication benchmarks demonstrate so poor performance?
When DFC-client tries to establish new connection with content server it performs following sequence of actions:
- establishes TCP connection with content server (after TCP handshaking content server spawns new process/thread which establishes connection with underlying database)
- sends “new session” RPC
- requests list of available RPC commands by sending “ENTRY_POINTS” RPC command
- sends “AUTHENTICATE_USER” RPC command
- sends “GET_ERRORS” RPC command
- receives available messages from content server
Actually, algorithm described above has following performance gaps:
- this is a good idea to maintain several spare or idle server processes/threads, which stand ready to serve incoming requests
- there is no reason to request entry points at every “connection” request – entry points must be cached on client side
- there is no reason to request messages from server by sending “GET_ERRORS” RPC command – actually this step is initiated by content server through setting extra flags in response for “AUTHENTICATE_USER” RPC command:
[dmadmin@docu70dev01 ~]$ iapi Please enter a docbase name (docubase): ssc_dev Please enter a user (dmadmin): Please enter password for dmadmin: EMC Documentum iapi - Interactive API interface (c) Copyright EMC Corp., 1992 - 2012 All rights reserved. Client Library Release 7.0.0130.0537 Connecting to Server using docbase ssc_dev // receiving of this message is initiated by CS [DM_SESSION_I_SESSION_START]info: "Session 0101ffd78010db5e started for user dmadmin." Connected to Documentum Server running Release 7.0.0140.0644 Linux.Oracle Session id is s0 API>
Furthermore, content server does not cache user’s credentials after successful authentication, so every time when authentication is performed content server “honestly” tries to check user’s credentials. So, the result is “predictable”. To confirm my suspicions about suboptimal connection handshaking algorithm I have hacked my dctmpy library to measure how fast it performs only first and second steps of connection handshaking (and disconnect as well) and got expected result – about 25 connections per second, i.e. 3 times improvement:
[dmadmin@docu67dev02 ~]$ cat > test1.py # coding=utf-8 from timeit import timeit def main(): setup = """\ from dctmpy.docbaseclient import DocbaseClient session = DocbaseClient( host="192.168.2.56", port=12000, docbaseid=131031 ) entrypoints = session.entrypoints session.disconnect """ stmt = """\ session = DocbaseClient( host="192.168.2.56", port=12000, entrypoints=entrypoints, docbaseid=131031 ) session.disconnect() """ A = timeit(setup=setup, stmt=stmt, number=1000) print("%15s %6.2fs" % ("Python", A)) if __name__ == '__main__': main() [dmadmin@docu67dev02 ~]$ python test1.py Python 39.08s [dmadmin@docu67dev02 ~]$ python test1.py Python 36.94s [dmadmin@docu67dev02 ~]$ python test1.py Python 40.12s [dmadmin@docu67dev02 ~]$ python test1.py Python 39.45s [dmadmin@docu67dev02 ~]$ python test1.py Python 36.24s [dmadmin@docu67dev02 ~]$ python test1.py Python 37.98s [dmadmin@docu67dev02 ~]$ python test1.py Python 42.81s
Object creation and fetch
The results for these two benchmarks are disappointing too – I suppose that results should be at least 10 times better.
About 20 sysobject creations per second:
About 80-100 fetches per second: