A FATAL error has occurred

Have you ever seen such error:

?

I believe everybody who tried to deploy Documentum in large enterprise had faced with such spontaneous errors but never paid much attention because error message is completely misleading: “[DM_SESSION_E_AUTH_FAIL]error: “Authentication failed for user” – dumb user enters invalid password, so it’s not our issue. But actually, this error reveals a lot of problem related to session management in Documentum.

Root cause

If take a careful look at stacktrace it becomes clear that the error originates not from login page but somewhere from WDK (put any other application here) intestines:

at com.documentum.web.formext.privilege.PrivilegeService.getUserPrivilege(PrivilegeService.java:57)
at com.documentum.web.formext.config.PrivilegeQualifier.getScopeValue(PrivilegeQualifier.java:69)
at com.documentum.web.formext.config.BoundedContextCache.retrieveQualifierScopeValue(BoundedContextCache.java:153)
at com.documentum.web.formext.config.ScopeKey.<init>(ScopeKey.java:57)
at com.documentum.web.formext.config.ConfigService.makeScopeKey(ConfigService.java:1482)
at com.documentum.web.formext.config.ConfigService.lookupElement(ConfigService.java:527)

this fact means that user was already successfully logged in and this is, obviously, not a user’s mistake. So, what does really happen there? The basic description of the problem is: user’s dfc session got either reused by another user or released by application and when application tries to acquire new dfc session it fails. So, the first question is why application fails to acquire new dfc session. I believe there are a lot of reasons, but the most common for large enterprises is following: ldap authentication in Documentum is unstable: most time it works as expected but sometimes it fails and causes a hardly diagnosable issues.

Mitigation options

  1. disable D7 session pooling in dfc (i.e. set dfc.compatibility.useD7SessionPooling to false) – the most of customers noticed that error has started occurring more frequently after moving to new version of dfc, actually, it is a true, because new pooling implementation tends to keep the amount of dfc sessions as small as possible, so the amount of authentication requests increases
  2. if you use bind_type=bind_search_dn in ldap config switch to bind_by_dn – it will decrease the amount of ldap round-trips
  3. use as nearest ldap servers as possible
  4. put a blame onto EMC – authentication is not a kind of thing which must occur every 5 seconds due to poor application design

Feedback storm

On last week I received a dozen of requests related to the one of recent blogposts, and all those requests contain the same two questions:

  • Why is it password protected?
  • When will it be publicly available?

The short story is: after installing latest Content Server patches I had faced with severe compatibility issues and further research revealed that those compatibility issues are caused by “security enhancements” which do not really look like security fixes, for example, I had spent about 6 hours to find out why my application has started failing when running on latest patchset and how to disable new weird behaviour, and just 30 minutes to write a new proof of concept which bypasses new “security enhancements”. After that I performed a deep analysis of last ten security vulnerabilities, which were announced by EMC as remediated and found the same problem: nothing was fixed, moreover some of them contain so rude mistakes that those mistakes look more like backdoors rather than mistakes. At current state I’m trying to bring together all information I have and this blogpost will be publicly available soon.