A FATAL error has occurred. Part II

20 months ago I described a bizarre behaviour in webtop, now it is time to describe how to solve such problem (actually, customer have shared a simple testcase when user changes his password via Ctrl+Alt+Del on Windows computer and after that he need to clear cookies in order to force webtop to work). I do think the best option here is to replace actual user’s password by login ticket and the best candidate for that is com.documentum.web.formext.session.AuthenticationService:

import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpSession;

import com.documentum.fc.client.IDfSession;
import com.documentum.fc.client.IDfSessionManager;
import com.documentum.fc.common.DfException;
import com.documentum.fc.common.DfLoginInfo;
import com.documentum.fc.common.IDfLoginInfo;

/**
 * @author Andrey B. Panfilov <andrey@panfilov.tel>
 */
public class AuthenticationServiceCustom extends AuthenticationService {

    public AuthenticationServiceCustom() {
        super();
    }

    @Override
    public void login(HttpSession httpSession, String principalName,
            String docbase, HttpServletRequest req)
        throws DfException {
        super.login(httpSession, principalName, docbase, req);
        replaceTicket(docbase);
    }

    @Override
    public void login(HttpSession httpSession, String principalName,
            String docbase)
        throws DfException {
        super.login(httpSession, principalName, docbase);
        replaceTicket(docbase);
    }

    @Override
    public void login(HttpSession httpSession, String docbase,
            String userLoginName, String userPassword, String domain)
        throws PasswordExpiredException, DfException {
        super.login(httpSession, docbase, userLoginName, userPassword, domain);
        replaceTicket(docbase);
    }

    @Override
    public void login(HttpSession httpSession, String docbase, String domain,
            Object binaryCredential)
        throws DfException {
        super.login(httpSession, docbase, domain, binaryCredential);
        replaceTicket(docbase);
    }

    @Override
    public void login(HttpSession httpSession, String docbase, String domain,
            Object binaryCredential, HttpServletRequest req)
        throws DfException {
        super.login(httpSession, docbase, domain, binaryCredential, req);
        replaceTicket(docbase);
    }

    @Override
    public void login(HttpSession httpSession, String docbase,
            String userLoginName, String password, String domain,
            HttpServletRequest req)
        throws DfException {
        super.login(httpSession, docbase, userLoginName, password, domain, req);
        replaceTicket(docbase);
    }

    private void replaceTicket(String docbase) throws DfException {
        IDfSessionManager sessionManager = SessionManagerHttpBinding
                .getSessionManager();
        IDfSession session = null;
        try {
            int dotIndex = docbase.indexOf('.');
            if (dotIndex != -1) {
                docbase = docbase.substring(0, dotIndex);
            }
            session = sessionManager.getSession(docbase);
            int timeout = session.getServerConfig()
                    .getInt("max_login_ticket_timeout");
            String ticket = session.getLoginTicketEx(null, "docbase", timeout,
                    false, docbase);
            String userName = session.getLoginUserName();
            if (sessionManager.hasIdentity(docbase)) {
                sessionManager.clearIdentity(docbase);
            }
            IDfLoginInfo loginInfo = new DfLoginInfo(userName, ticket);
            sessionManager.setIdentity(docbase, loginInfo);
        } finally {
            if (session != null) {
                sessionManager.release(session);
            }
        }
    }

}

A FATAL error has occurred

Have you ever seen such error:

?

I believe everybody who tried to deploy Documentum in large enterprise had faced with such spontaneous errors but never paid much attention because error message is completely misleading: “[DM_SESSION_E_AUTH_FAIL]error: “Authentication failed for user” – dumb user enters invalid password, so it’s not our issue. But actually, this error reveals a lot of problem related to session management in Documentum.

Root cause

If take a careful look at stacktrace it becomes clear that the error originates not from login page but somewhere from WDK (put any other application here) intestines:

at com.documentum.web.formext.privilege.PrivilegeService.getUserPrivilege(PrivilegeService.java:57)
at com.documentum.web.formext.config.PrivilegeQualifier.getScopeValue(PrivilegeQualifier.java:69)
at com.documentum.web.formext.config.BoundedContextCache.retrieveQualifierScopeValue(BoundedContextCache.java:153)
at com.documentum.web.formext.config.ScopeKey.<init>(ScopeKey.java:57)
at com.documentum.web.formext.config.ConfigService.makeScopeKey(ConfigService.java:1482)
at com.documentum.web.formext.config.ConfigService.lookupElement(ConfigService.java:527)

this fact means that user was already successfully logged in and this is, obviously, not a user’s mistake. So, what does really happen there? The basic description of the problem is: user’s dfc session got either reused by another user or released by application and when application tries to acquire new dfc session it fails. So, the first question is why application fails to acquire new dfc session. I believe there are a lot of reasons, but the most common for large enterprises is following: ldap authentication in Documentum is unstable: most time it works as expected but sometimes it fails and causes a hardly diagnosable issues.

Mitigation options

  1. disable D7 session pooling in dfc (i.e. set dfc.compatibility.useD7SessionPooling to false) – the most of customers noticed that error has started occurring more frequently after moving to new version of dfc, actually, it is a true, because new pooling implementation tends to keep the amount of dfc sessions as small as possible, so the amount of authentication requests increases
  2. if you use bind_type=bind_search_dn in ldap config switch to bind_by_dn – it will decrease the amount of ldap round-trips
  3. use as nearest ldap servers as possible
  4. put a blame onto EMC – authentication is not a kind of thing which must occur every 5 seconds due to poor application design