Keep your pathos to yourself

Month ago EMC made a ridiculous announcement:

There are a couple of buggy classes on GitHub, but it is already an “open source initiative”! LOL. But today the content of announcement got changed to something more neutral:

Glad to know that there are people who are able to think soberly in EMC.

Why you need dfc.session.keepalive.enable

Have spent two days for troubleshooting another one DFC issue, however it was worth it.

Two days ago my colleagues started complaining about weird Webtop performance – sometimes it’s response time was good, sometimes it stucks (actually, for their configuration, i.e. 48 Weblogic nodes, it is some kind of “expected” behaviour – the challenge is to find failure nodes). Further investigation revealed that problem nodes got stuck with following stack:

"[STUCK] ExecuteThread: '114' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=10 tid=0x00007f3624096000 nid=0x7606 runnable [0x00007f34ae6e3000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.FileDispatcher.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
        at sun.nio.ch.IOUtil.read(IOUtil.java:171)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
        - locked <0x000000061f780c10> (a java.lang.Object)
        at com.documentum.fc.impl.util.io.MessageChannel.readSocket(MessageChannel.java:123)
        at com.documentum.fc.impl.util.io.MessageChannel.readLength(MessageChannel.java:100)
        at com.documentum.fc.impl.util.io.MessageChannel.getIncomingMessageLength(MessageChannel.java:92)
        at com.documentum.fc.impl.util.io.MessageChannel.read(MessageChannel.java:77)
        - locked <0x000000061f77edb8> (a com.documentum.fc.impl.util.io.MessageChannel)
        at com.documentum.fc.client.impl.connection.netwise.AbstractNetwiseRpcClient.receiveMessage(AbstractNetwiseRpcClient.java:183)
        at com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient.applyForObject(NetwiseDocbaseRpcClient.java:638)
        - locked <0x000000061f77edd8> (a com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection$8.evaluate(DocbaseConnection.java:1372)
        - locked <0x000000061f77ee28> (a com.documentum.fc.client.impl.connection.docbase.DocbaseConnection)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.evaluateRpc(DocbaseConnection.java:1131)
        - locked <0x000000061f77ee28> (a com.documentum.fc.client.impl.connection.docbase.DocbaseConnection)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.applyForObject(DocbaseConnection.java:1364)
        at com.documentum.fc.client.impl.docbase.DocbaseApi.authenticateUser(DocbaseApi.java:1867)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.authenticate(DocbaseConnection.java:436)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseConnectionManager.authenticateConnection(DocbaseConnectionManager.java:356)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseConnectionManager.assignConnection(DocbaseConnectionManager.java:202)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseConnectionManager.getDocbaseConnection(DocbaseConnectionManager.java:105)
        at com.documentum.fc.client.impl.session.SessionFactory.newSession(SessionFactory.java:23)
        at com.documentum.fc.client.impl.session.PrincipalAwareSessionFactory.newSession(PrincipalAwareSessionFactory.java:44)
        at com.documentum.fc.client.impl.session.PooledSessionFactory.newSession(PooledSessionFactory.java:49)
        at com.documentum.fc.client.impl.session.SessionManager.getSessionFromFactory(SessionManager.java:134)
        at com.documentum.fc.client.impl.session.SessionManager.newSession(SessionManager.java:72)
        at com.documentum.fc.client.impl.session.SessionManager.getSession(SessionManager.java:191)
        - locked <0x0000000619d5a840> (a com.documentum.fc.client.impl.session.SessionManager)
        at com.documentum.web.formext.docbase.FormatService.queryFormats(FormatService.java:529)
        at com.documentum.web.formext.docbase.FormatService.getFormats(FormatService.java:581)
        - locked <0x00000005edaa41d8> (a java.lang.String)
        at com.documentum.web.formext.docbase.FormatService.getFormatsByFileExtension(FormatService.java:293)
        at com.documentum.web.formext.docbase.FormatService.guessFormatFromFileExtension(FormatService.java:376)
        at com.documentum.web.formext.docbase.FormatService.guessFormatFromFileExtension(FormatService.java:357)
        at com.documentum.web.servlet.FileFormatIconResolver.service(FileFormatIconResolver.java:122)

I have no idea why, but EMC employees love to use following anti-pattern:

synchronized (shared resource) {
  do some docbase stuff w/o
  exclusive access to docbase session
}

for that reason we have replaced a plenty of Webtop classes by our own non-blockingcorrect implementation, and that was exactly this case – wrong pattern. BUT, there was also unusual thing: as it is obvious from stack problem thread got stuck on authentication (“DocbaseConnection.authenticate(DocbaseConnection.java:436)”), typically such operation does not take a lot of time, but weblogic had marked thread as stuck, which in turn means that thread hasn’t changed it’s state for at least last 15 minutes (actual period was 15 hours :)). Sounds weird, doesn’t it? So, what is the issue? I have spent about a day to create a semi-reproducible testcase:

/**
 * @author Andrey B. Panfilov <andrey@panfilov.tel>
 */
public class Test implements Runnable {

	private static final Map<Integer, Thread> THREADS = new HashMap<Integer, Thread>();

	private static final List<AtomicLong> COUNTERS = new ArrayList<AtomicLong>();

	public static final IDfClient CLIENT;

	public static final int THREAD_COUNT = 20;

	public static final AtomicInteger INPROGRESS = new AtomicInteger(THREAD_COUNT);

	public static final Semaphore SEMAPHORE = new Semaphore(THREAD_COUNT);

	static {
		try {
			CLIENT = new DfClientX().getLocalClient();
		} catch (DfException ex) {
			throw new RuntimeException(ex);
		}
	}

	private int _threadNo;

	public Test(int threadNo) {
		_threadNo = threadNo;
	}

	public static void main(String[] args) throws Exception {

		SEMAPHORE.acquire(THREAD_COUNT);

		for (int i = 0; i < THREAD_COUNT; i++) {
			Thread t = new Thread(new Test(i));
			THREADS.put(i, t);
			COUNTERS.add(new AtomicLong(0));
			t.start();
		}

		while (INPROGRESS.get() > 0) {
			Thread.sleep(1000);
		}

		SEMAPHORE.release(THREAD_COUNT);

		int it = 0;
		while (true) {
			it++;
			long[] l = new long[COUNTERS.size()];
			for (int i = 0, n = COUNTERS.size(); i < n; i++) {
				l[i] = COUNTERS.get(i).get();
			}
			Thread.sleep(10000);
			int alive = 0;
			int stuck = 0;
			for (int i = 0, n = COUNTERS.size(); i < n; i++) {
				Thread t = THREADS.get(i);
				boolean isAlive = t.isAlive();
				if (isAlive) {
					alive++;
					long val = COUNTERS.get(i).get();
					if (val == l[i]) {
						stuck++;
					}
				}
			}
			System.out.println("Iteration: " + it + ", alive: " + alive + ", stuck: " + stuck);
			if (alive == 0) {
				return;
			}
		}

	}

	@Override
	public void run() {
		IDfSession session = null;
		try {
			IDfSessionManager sessionManager = CLIENT.newSessionManager();
			DfLoginInfo loginInfo = new DfLoginInfo("dmadmin", "dmadmin");
			loginInfo.setForceAuthentication(false);
			loginInfo.setPeriodicAuthentication(false);
			sessionManager.setIdentity("DCTM_DEV", loginInfo);
			session = sessionManager.getSession("DCTM_DEV");
			INPROGRESS.decrementAndGet();
			SEMAPHORE.acquire();
			while (true) {
				doJob(session);
				COUNTERS.get(_threadNo).incrementAndGet();
			}
		} catch (Exception ex) {
			// ignore
			// do not disconnect because we need to count time properly
		}
	}

	public static void doJob(IDfSession session) throws DfException {
		IDfQuery query = new DfQuery("SELECT user_name FROM dm_user ENABLE(RETURN_TOP 10)");
		IDfCollection collection = query.execute(session, IDfQuery.DF_EXEC_QUERY);
		while (collection.next()) {
			collection.getString("user_name");
		}
		// no finally block because we need to count time properly
		collection.close();
	}

}

which requires a bit tricky execution:

  1. start testcase
  2. after 10-20 seconds perform a hard shutdown of CS host (echo o > /proc/sysrq-trigger)
  3. if after 20 minutes testcase reports that there are stuck threads:
    Iteration: 114, alive: 2, stuck: 2
    Iteration: 115, alive: 2, stuck: 2
    Iteration: 116, alive: 2, stuck: 2
    Iteration: 117, alive: 2, stuck: 2
    Iteration: 118, alive: 2, stuck: 2

    you caught it, congratulations!

Now the interesting thing is when testcase got reproduced we have following execution stack:

"Thread-26" prio=10 tid=0x00007f84682f9000 nid=0x15d06 runnable [0x00007f84617bd000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.FileDispatcher.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
        at sun.nio.ch.IOUtil.read(IOUtil.java:171)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
        - locked <0x00000000f20e45e8> (a java.lang.Object)
        at com.documentum.fc.impl.util.io.MessageChannel.readSocket(MessageChannel.java:123)
        at com.documentum.fc.impl.util.io.MessageChannel.readLength(MessageChannel.java:100)
        at com.documentum.fc.impl.util.io.MessageChannel.getIncomingMessageLength(MessageChannel.java:92)
        at com.documentum.fc.impl.util.io.MessageChannel.read(MessageChannel.java:77)
        - locked <0x00000000f242b850> (a com.documentum.fc.impl.util.io.MessageChannel)
        at com.documentum.fc.client.impl.connection.netwise.AbstractNetwiseRpcClient.receiveMessage(AbstractNetwiseRpcClient.java:183)
        at com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient.applyForCollection(NetwiseDocbaseRpcClient.java:414)
        - locked <0x00000000f242b7c8> (a com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection$3.evaluate(DocbaseConnection.java:1277)
        - locked <0x00000000f23f5e08> (a com.documentum.fc.client.impl.connection.docbase.DocbaseConnection)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.evaluateRpc(DocbaseConnection.java:1139)
        - locked <0x00000000f23f5e08> (a com.documentum.fc.client.impl.connection.docbase.DocbaseConnection)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.applyForCollection(DocbaseConnection.java:1265)
        at com.documentum.fc.client.impl.docbase.DocbaseApi.exec(DocbaseApi.java:83)
        at com.documentum.fc.client.impl.session.Session.query(Session.java:3630)
        at com.documentum.fc.client.impl.session.SessionHandle.query(SessionHandle.java:2322)
        at com.documentum.fc.client.DfQuery.runQuery(DfQuery.java:167)
        at com.documentum.fc.client.DfQuery.execute(DfQuery.java:216)
        at Test.doJob(Test.java:123)
        at Test.run(Test.java:112)
        at java.lang.Thread.run(Thread.java:662)

and following connection state from OS perspective:

[dmadmin@docu72dev02 ~]$ netstat -na --timers| sed -e '1,2p;/10000/!d'
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address               Foreign Address             State       Timer
tcp        0      0 ::ffff:172.16.49.138:47424  ::ffff:172.16.49.131:10000  ESTABLISHED off (0.00/0/0)
tcp        0      0 ::ffff:172.16.49.138:47406  ::ffff:172.16.49.131:10000  ESTABLISHED off (0.00/0/0)

and this is weird: operating system thinks that connections to CS are idle, but java waits for CS response!. Looks like a bug between java and operating system. Note that it is much simpler to reproduce testcase on SSL-connections – in case of plain connections I rarely get one or two stuck sockets, but in case of SSL even the first attempt created a plenty of them:

[dmadmin@docu72dev02 ~]$  netstat -na --timers| sed -e '1,2p;/10001/!d'
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address               Foreign Address             State       Timer
tcp        0      0 ::ffff:172.16.49.138:55960  ::ffff:172.16.49.131:10001  ESTABLISHED off (0.00/0/0)
tcp        0      0 ::ffff:172.16.49.138:55951  ::ffff:172.16.49.131:10001  ESTABLISHED off (0.00/0/0)
tcp        0      0 ::ffff:172.16.49.138:55954  ::ffff:172.16.49.131:10001  ESTABLISHED off (0.00/0/0)
tcp        0    122 ::ffff:172.16.49.138:55958  ::ffff:172.16.49.131:10001  ESTABLISHED on (73.96/10/0)
tcp        0    122 ::ffff:172.16.49.138:55961  ::ffff:172.16.49.131:10001  ESTABLISHED on (70.89/10/0)
tcp        0      0 ::ffff:172.16.49.138:55957  ::ffff:172.16.49.131:10001  ESTABLISHED off (0.00/0/0)
tcp        0      0 ::ffff:172.16.49.138:55959  ::ffff:172.16.49.131:10001  ESTABLISHED off (0.00/0/0)
tcp        0      0 ::ffff:172.16.49.138:55950  ::ffff:172.16.49.131:10001  ESTABLISHED off (0.00/0/0)
tcp        0    314 ::ffff:172.16.49.138:55949  ::ffff:172.16.49.131:10001  ESTABLISHED on (72.94/10/0)
tcp        0      0 ::ffff:172.16.49.138:55967  ::ffff:172.16.49.131:10001  ESTABLISHED off (0.00/0/0)
tcp        0    122 ::ffff:172.16.49.138:55955  ::ffff:172.16.49.131:10001  ESTABLISHED on (71.91/10/0)
tcp        0    122 ::ffff:172.16.49.138:55952  ::ffff:172.16.49.131:10001  ESTABLISHED on (69.87/10/0)
tcp        0    122 ::ffff:172.16.49.138:55965  ::ffff:172.16.49.131:10001  ESTABLISHED on (70.89/10/0)
tcp        0    122 ::ffff:172.16.49.138:55948  ::ffff:172.16.49.131:10001  ESTABLISHED on (70.89/10/0)
tcp        0    122 ::ffff:172.16.49.138:55964  ::ffff:172.16.49.131:10001  ESTABLISHED on (70.89/10/0)
tcp        0      0 ::ffff:172.16.49.138:55956  ::ffff:172.16.49.131:10001  ESTABLISHED off (0.00/0/0)
tcp        0    122 ::ffff:172.16.49.138:55962  ::ffff:172.16.49.131:10001  ESTABLISHED on (71.91/10/0)
tcp        0    122 ::ffff:172.16.49.138:55953  ::ffff:172.16.49.131:10001  ESTABLISHED on (70.89/10/0)
tcp        0    122 ::ffff:172.16.49.138:55963  ::ffff:172.16.49.131:10001  ESTABLISHED on (71.91/10/0)
tcp        0      0 ::ffff:172.16.49.138:55966  ::ffff:172.16.49.131:10001  ESTABLISHED off (0.00/0/0)

Now let’s read EMC’s documentation (I previously already criticized EMC’s whitepaper):

DFC application hangs on the Linux machine

If the client machine running the DFC client application is a Linux machine, and if the server to which it is connected goes off the network, the socket takes very long time to timeout, Occasionally, it causes the application server on the DFC client application to stop responding. This problem is related to low-level operating system socket handling. The developer must change the operating system level networking kernel settings and enable dfc.session.keepalive.enable property in the dfc.properties file to resolve this problem. It is also recommended that application administrators collaborate with server administrators before making the change. Server administrators must indetify appropriate settings for the kernel parameters based on their perfromance requirement and deployment.

Operating system level settings for Linux:

  • net.ipv4.tcp_keepalive_time = 1
  • net.ipv4.tcp_keepalive_intvl = 5
  • net.ipv4.tcp_keepalive_probes = 3
  • net.ipv4.tcp_retries2 = 3

What is the problem there? At first, the value of the last parameter (net.ipv4.tcp_retries2) EMC suggests is wrong – this parameter is responsible for RTO retransmissions and it is irrelevant to the situation where “the socket takes very long time to timeout” – 15 minutes is not a “long time”, below is a full description of this parameter:

tcp_retries2 - INTEGER
    This value influences the timeout of an alive TCP connection,
    when RTO retransmissions remain unacknowledged.
    Given a value of N, a hypothetical TCP connection following
    exponential backoff with an initial RTO of TCP_RTO_MIN would
    retransmit N times before killing the connection at the (N+1)th RTO.

    The default value of 15 yields a hypothetical timeout of 924.6
    seconds and is a lower bound for the effective timeout.
    TCP will effectively time out at the first RTO which exceeds the
    hypothetical timeout.

    RFC 1122 recommends at least 100 seconds for the timeout,
    which corresponds to a value of at least 8.

At second, manipulations with keepalives look like an attempt to not solve a problem but create a rude hack: java/operating system does not fire a timer for our connections, let introduce keepalive timeouts instead. However such setting does really work – in case of enabled dfc.session.keepalive.enable I get following connections state:

[dmadmin@docu72dev02 ~]$  netstat -na --timers| sed -e '1,2p;/10001/!d'
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address               Foreign Address             State       Timer
tcp        0      0 ::ffff:172.16.49.138:55988  ::ffff:172.16.49.131:10001  ESTABLISHED keepalive (7142.37/0/0)
tcp        0      0 ::ffff:172.16.49.138:55991  ::ffff:172.16.49.131:10001  ESTABLISHED keepalive (7142.37/0/0)
tcp        0      0 ::ffff:172.16.49.138:55999  ::ffff:172.16.49.131:10001  ESTABLISHED keepalive (7143.37/0/0)
tcp        0      0 ::ffff:172.16.49.138:55996  ::ffff:172.16.49.131:10001  ESTABLISHED keepalive (7142.37/0/0)
tcp        0      0 ::ffff:172.16.49.138:55994  ::ffff:172.16.49.131:10001  ESTABLISHED keepalive (7142.37/0/0)
tcp        0    314 ::ffff:172.16.49.138:55989  ::ffff:172.16.49.131:10001  ESTABLISHED on (3.09/5/0)
tcp        0      0 ::ffff:172.16.49.138:55998  ::ffff:172.16.49.131:10001  ESTABLISHED keepalive (7143.37/0/0)
tcp        0      0 ::ffff:172.16.49.138:55986  ::ffff:172.16.49.131:10001  ESTABLISHED keepalive (7142.37/0/0)
tcp        0      0 ::ffff:172.16.49.138:56003  ::ffff:172.16.49.131:10001  ESTABLISHED keepalive (7143.37/0/0)
tcp        0      0 ::ffff:172.16.49.138:55997  ::ffff:172.16.49.131:10001  ESTABLISHED keepalive (7142.37/0/0)
tcp        0      0 ::ffff:172.16.49.138:56000  ::ffff:172.16.49.131:10001  ESTABLISHED keepalive (7143.37/0/0)
tcp        0      0 ::ffff:172.16.49.138:55992  ::ffff:172.16.49.131:10001  ESTABLISHED keepalive (7142.37/0/0)
tcp        0    314 ::ffff:172.16.49.138:55985  ::ffff:172.16.49.131:10001  ESTABLISHED on (3.11/5/0)
tcp        0      0 ::ffff:172.16.49.138:55990  ::ffff:172.16.49.131:10001  ESTABLISHED keepalive (7142.37/0/0)
tcp        0      0 ::ffff:172.16.49.138:56001  ::ffff:172.16.49.131:10001  ESTABLISHED keepalive (7143.37/0/0)
tcp        0      0 ::ffff:172.16.49.138:56002  ::ffff:172.16.49.131:10001  ESTABLISHED keepalive (7143.37/0/0)
tcp        0      0 ::ffff:172.16.49.138:55984  ::ffff:172.16.49.131:10001  ESTABLISHED keepalive (7142.36/0/0)
tcp        0      0 ::ffff:172.16.49.138:55993  ::ffff:172.16.49.131:10001  ESTABLISHED keepalive (7142.37/0/0)
tcp        0      0 ::ffff:172.16.49.138:55995  ::ffff:172.16.49.131:10001  ESTABLISHED keepalive (7142.37/0/0)
tcp        0      0 ::ffff:172.16.49.138:55987  ::ffff:172.16.49.131:10001  ESTABLISHED keepalive (7142.37/0/0)

which promises that operating system will down stuck connections after (tcp_keepalive_time + tcp_keepalive_probes * tcp_keepalive_intvl) seconds.

r_object_id. Type tag

Common structure

[5f][024be9][800390f2]
  │     │       │
  │     │       └─── object id part 
  │     └─── docbase id part
  └──── type tag part  

Type tag

Possible values of type tag are described here
What is the purpose of type tag? Initially I thought that it is specially introduced to improve performance but now I doubt that my thoughts were right. Let me explain. If you read What happens when smart guy does not sit back blogpost, you should know that when Content Server tries to get object’s snapshot from database, CS (in general) performs two SQL queries:

-- determine object type by r_object_id
SELECT dm_dbalias_B.I_TYPE
  FROM DMI_OBJECT_TYPE dm_dbalias_B
 WHERE dm_dbalias_B.R_OBJECT_ID = :dmb_handle
  
-- retrieve data from database
  SELECT *
    FROM TYPE_RV dm_dbalias_B, TYPE_SV dm_dbalias_C
   WHERE (    dm_dbalias_C.R_OBJECT_ID = :dmb_handle
          AND dm_dbalias_C.R_OBJECT_ID = dm_dbalias_B.R_OBJECT_ID)
ORDER BY dm_dbalias_B.R_OBJECT_ID, dm_dbalias_B.I_POSITION

The first query intended to determine object’s type and the second one retrieves data from database. And the question is “Do we need to execute first query to determine object’s type or not?”. And from my perspective the answer is: in some cases it is possible to avoid first select. What are these cases? It is obvious that if object type has a predefined type tag and it is not possible to subtype this object type we do not need to perform select against dmi_object_type table – we already know what tables we should use to retrieve data from database. And indeed, for some cases Content Server follows this concept, and we may check these cases manually: just try to retrieve non-existing object and check what was the last SQL query, for example:

groovy:000> // non-existing acl
===> true
groovy:000> fetch,c,45024be980ffffff
ERROR com.documentum.fc.client.DfIdNotFoundException:
[DM_API_E_EXIST]error:  
 "Document/object specified by 45024be980ffffff does not exist."
groovy:000> ?,c,exec get_last_sql
USER ATTRIBUTES

  result                          : 
SELECT * FROM DM_ACL_S dm_dbalias_B  WHERE dm_dbalias_B.R_OBJECT_ID=:dmb_handle	
 PARAMS :dmb_handle=45024be980ffffff 

but:

groovy:000> // non-existing group
===> true
groovy:000> fetch,c,12024be980ffffff
ERROR com.documentum.fc.client.DfIdNotFoundException:
[DM_API_E_EXIST]error:  
 "Document/object specified by 12024be980ffffff does not exist."
groovy:000> ?,c,exec get_last_sql
USER ATTRIBUTES

  result                          : 
SELECT dm_dbalias_B.I_TYPE FROM DCTM_DEV.DMI_OBJECT_TYPE dm_dbalias_B 
 WHERE dm_dbalias_B.R_OBJECT_ID=:dmb_handle PARAMS :dmb_handle=12024be980ffffff 

groovy:000> ?,c,create type custom_group with supertype dm_group
ERROR com.documentum.fc.common.DfException:
[DM_QUERY_E_BAD_SUPERTYPE]error:  
 "CREATE TYPE:  A non-subtypeable type was specified as supertype."

So, Content Server does know that dm_group is non-subtypeable object type and it has predefined type tag, but still tries to use dmy_object_type table for lookups. Why does the behaviour differ for dm_acl and dm_group types? I believe that the clue is a value of unique_tag attribute in dm_type:

API> ?,c,select name,unique_tag from dm_type where unique_tag<>0
name                         unique_tag
---------------------------  ------------
dmi_subcontent                        100
dm_dump_record                         47
dmi_dump_object_record                 48
dm_load_record                         49
dmi_load_object_record                 50
dmi_queue_item                         27
dmi_package                            73
dmi_session                             1
dmi_transactionlog                     33
dmi_recovery                           72
dm_docbaseid_map                       68
dm_assembly                            13
dmi_dd_type_info                      105
dmi_registry                           38
dm_acl                                 69
dmr_containment                         5
dmi_dd_attr_info                      106
dmi_replica_record                     45
dmi_dist_comp_record                   54
dmr_content                             6
dm_reference                           71
dmi_workitem                           74
dm_partition_scheme                   111
(23 rows affected)

However, this list looks bit strange (let pretend that absence of dm_group is just a mistake)… On the one hand, tell me who the heck is going to subtype following types: dm_process, dm_activity, dm_workflow, dm_server_config, dm_docbase_config, dm_type, dmi_type_info, dm_method, dm_registered, dmi_index, dmi_sequence, dmi_vstamp, dm_policy, etc? Why do not make this types non-subtypeable and improve performance? On the other hand, there is an obvious design gap related to implementation of dm_audittrail – refresh your memory, what is the most problem/fastest growing/largest table in Documentum database? dm_audittrail_s. Below are other performance issues related to dm_audittrail:

  • writing audit is extremely slow – disabling audit improves ingestion rates by 40%, and inserting into dmi_object_type table, obviously, takes a significant part of these 40%
  • presence of audit “records” in dmi_object_type table obviously slowdowns everything
  • dm_audittrail is not partition-capable – EMC sells HVS option for content server, but this options is useless against dm_audittrail
  • deleting dm_audittrail is slow too, you need to use Documentum API to delete these records

But the “funny” thing is EMC does not know that dm_audittrail_s table exists, for example, check their sizing spreadshit – EMC’s estimations do no include dm_audittrail, sizing spreadshit contains tenfold error:

Unfortunately, Content Server is not the only product which improperly uses type tags. Have you ever thought why webtop’s browser tree is extremely slow in expanding folder nodes, especially if current folder contains a lot of documents? Because to get a list of subfolders webtop relies on relation model and performs something like:

SELECT UPPER(object_name), r_object_id, object_name,
 r_object_type, r_link_cnt, r_lock_owner
FROM dm_folder WHERE a_is_hidden=FALSE 
 AND (i_is_reference IS NULL OR i_is_reference = 0) 
 AND FOLDER(ID('<folder_id>'))
ORDER BY 1

instead of taking into account the fact that folder identifiers start with ‘0b’ and executing something like:

SELECT UPPER(object_name), r_object_id, object_name,
 r_object_type, r_link_cnt, r_lock_owner
FROM dm_folder WHERE a_is_hidden=FALSE 
 AND (i_is_reference IS NULL OR i_is_reference = 0) 
 AND FOLDER(ID('<folder_id>'))
 AND r_object_id LIKE '0b%'
ORDER BY 1

Another example. Webtop’s com.documentum.web.form.control.databound.DFCQueryDataHandler (base class for the most of grids) does something weird in getTypeNameForCurrentRow method:

private String getTypeNameForCurrentRow() {
	if (hasColumn("r_object_type")) {
		return getDataField("r_object_type");
	}
	String objectId = getDataField("r_object_id");
	IDfPersistentObject object = getSession().getObject(DfId.valueOf(objectId));
	return object.getType().getName();
}

Being without dfc.properties. Part II

Today I was asked a question related to previous post:

Hi, how to set dynamic values like dfc.data.dir? The value cannot be set via external property because DFC tells me “4545 [main] WARN com.documentum.fc.common.DfPreferences – [DFC_PREFERENCE_NOT_DYNAMIC] Persistent change for preference “dfc.data.dir” will not take effect until the next restart

Any idea how to fix that?
Thanks
Jens

Well, what is the difference between dynamic and non-dynamic DFC preferences? The answer is not trivial as it might seem at first glance. The short answer is: for one kind of preferences DFC has idea how to handle their changes in runtime, for another one it hasn’t, so, “other” preferences are marked as non-dynamic. DFC notifies other objects about changes in preferences through special observers, for example:

Groovy Shell (2.4.6, JVM: 1.8.0_60)
Type ':help' or ':h' for help.
---------------------------------------------------------------------------------
groovy:000> import com.documentum.fc.common.impl.preferences.IPreferencesObserver
===> ...
groovy:000> import com.documentum.fc.common.impl.preferences.TypedPreferences
===> ...
groovy:000> import com.documentum.fc.common.DfPreferences
===> ...
groovy:000> p = DfPreferences.getInstance()
===> DfPreferences@1dcca426
groovy:000> o = new IPreferencesObserver() { 
     void update (TypedPreferences preferences, String preferenceName) {
          println preferenceName + " has been changed"
     } 
}
===> groovysh_evaluate$1@3eff6846
groovy:000> p.addObserver(o)
===> null
groovy:000> p.setString("dfc.docbroker.host","docu72dev01")
dfc.docbroker.host has been changed
===> null

So, from programming perspective non-dynamic preferences are preferences which has no observer implementation responsible for their changes. However, “dynamic” is not a good term for dynamic preferences, because in certain cases DFC does not apply changes immediately, for example:

Groovy Shell (2.4.6, JVM: 1.8.0_60)
Type ':help' or ':h' for help.
---------------------------------------------------------
groovy:000> import com.documentum.fc.common.DfPreferences
===> ...
groovy:000> import com.documentum.fc.common.DfLoginInfo
===> ...
groovy:000> import com.documentum.com.DfClientX
===> ...
groovy:000> c = new DfClientX().getLocalClient()
===> com.documentum.fc.client.DfClient$ClientImpl@4dd752e8
groovy:000> c.newSession("DCTM_DEV",new DfLoginInfo("dmadmin","dmadmin"))
ERROR com.documentum.fc.client.DfServiceException:
[DM_DOCBROKER_E_NO_DOCBROKERS]error:  "No DocBrokers are configured"
        at com.documentum.fc.client.DfServiceException.newNoDocbrokersException (DfServiceException.java:44)
        at com.documentum.fc.client.impl.docbroker.ServerMapBuilder.getMap (ServerMapBuilder.java:50)
        at com.documentum.fc.client.impl.docbroker.DocbrokerClient.getServerMap (DocbrokerClient.java:152)
        at com.documentum.fc.client.impl.connection.docbase.ServerChoiceManager.updateServerChoices (ServerChoiceManager.java:159)
        at com.documentum.fc.client.impl.connection.docbase.ServerChoiceManager.updateServerChoicesIfNecessary (ServerChoiceManager.java:148)
        at com.documentum.fc.client.impl.connection.docbase.ServerChoiceManager.getServerChoices (ServerChoiceManager.java:47)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.getServerChoices (DocbaseConnection.java:273)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.establishNewRpcClient (DocbaseConnection.java:227)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.open (DocbaseConnection.java:126)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.<init> (DocbaseConnection.java:100)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.<init> (DocbaseConnection.java:60)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseConnectionFactory.newDocbaseConnection (DocbaseConnectionFactory.java:26)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseConnectionManager.createNewConnection (DocbaseConnectionManager.java:180)
        at com.documentum.fc.client.impl.connection.docbase.DocbaseConnectionManager.getDocbaseConnection (DocbaseConnectionManager.java:110)
        at com.documentum.fc.client.impl.session.SessionFactory.newSession (SessionFactory.java:23)
        at com.documentum.fc.client.impl.session.SessionManager.getSessionFromFactory (SessionManager.java:134)
        at com.documentum.fc.client.impl.session.SessionManager.newSession (SessionManager.java:72)
        at com.documentum.fc.client.impl.session.SessionManager.getSession (SessionManager.java:191)
        at com.documentum.fc.client.DfClient$ClientImpl.newSession (DfClient.java:365)
        at com.documentum.fc.client.IDfClient$newSession.call (Unknown Source)
groovy:000> p = DfPreferences.getInstance()
===> DfPreferences@7c4a03a
groovy:000> p.setString("dfc.docbroker.host","docu72dev01")
===> null
groovy:000> c.newSession("DCTM_DEV",new DfLoginInfo("dmadmin","dmadmin"))
===> com.documentum.fc.client.impl.session.StrongSessionHandle@7efa366

but if you already have configured docbrokers and some repositories are registered on these docbrokres DFC will apply changes in “docbroker configuration” only after dfc.session.max_server_choice_age seconds (by default it is 5 minutes) because “docbroker configuration” is backed by a cache (note ServerChoiceManager.updateServerChoicesIfNecessary in stacktrace).

As regards to the question “whether we can setup non-dynamic preferences without using dfc.properties file”, the answer is yes, you can – just call DfPreferences#loadProperties(Properties, true) instead of DfPreferences#loadProperties(Properties, false) but you should understand what are you doing, check some examples below.

Accessing DfClient class two early causes “unexpected behaviour”

Groovy Shell (2.4.6, JVM: 1.8.0_60)
Type ':help' or ':h' for help.
---------------------------------------------------------
groovy:000> import com.documentum.fc.common.DfPreferences
===> ...
groovy:000> import com.documentum.fc.client.DfClient
===> ...
groovy:000> import com.documentum.fc.impl.RuntimeContext
===> ...
groovy:000> DfClient.class
===> class com.documentum.fc.client.DfClient
groovy:000> p = DfPreferences.getInstance()
===> DfPreferences@6cc44207
groovy:000> p.loadProperties(new Properties(["dfc.session.max_count":10]),true)
===> null
groovy:000> p.getString("dfc.session.max_count")
===> 10
groovy:000> r = RuntimeContext.getInstance()
===> com.documentum.fc.impl.RuntimeContext@dbca149
groovy:000> r.m_sessionFactory.m_connectionManager.m_maximumConnections
===> 1000

but:

Groovy Shell (2.4.6, JVM: 1.8.0_60)
Type ':help' or ':h' for help.
---------------------------------------------------------
groovy:000> import com.documentum.fc.common.DfPreferences
===> ...
groovy:000> import com.documentum.fc.client.DfClient
===> ...
groovy:000> import com.documentum.fc.impl.RuntimeContext
===> ...
groovy:000> p = DfPreferences.getInstance()
===> DfPreferences@2cdb5974
groovy:000> p.loadProperties(new Properties(["dfc.session.max_count":10]),true)
===> null
groovy:000> p.getString("dfc.session.max_count")
===> 10
groovy:000> r = RuntimeContext.getInstance()
===> com.documentum.fc.impl.RuntimeContext@696105c4
groovy:000> r.m_sessionFactory.m_connectionManager.m_maximumConnections
===> 10

calling DfPreferences#loadProperties(Properties, true) multiple times erases previously set up preferences:

groovy:000> p.loadProperties(new Properties(["dfc.session.max_count":10]),true)
===> null
groovy:000> p.getString("dfc.session.max_count")
===> 10
groovy:000> p.loadProperties(new Properties(["dfc.docbroker.host":"docu72dev01"]),true)
===> null
groovy:000> p.getString("dfc.session.max_count")
===> 1000
groovy:000> p.getString("dfc.docbroker.host")
===> docu72dev01
groovy:000> p.loadProperties(new Properties(["dfc.docbroker.host":"docu72dev01","dfc.session.max_count":10]),true)
===> null
groovy:000> p.getString("dfc.docbroker.host")
===> docu72dev01
groovy:000> p.getString("dfc.session.max_count")
===> 10

RETURN_RANGE hint

Today I have noticed a new default behaviour of RETURN_RANGE hint:

groovy:000> ?,c,select r_object_id from dm_sysobject  enable(RETURN_RANGE 1 10 'r_object_id ASC')
USER ATTRIBUTES


SYSTEM ATTRIBUTES

  r_object_id                     : 08024be98000013c
....

groovy:000> ?,c,exec get_last_sql
USER ATTRIBUTES

  result                          : 
select all dm_sysobject.r_object_id from dm_sysobject_sp  dm_sysobject where 
  (dm_sysobject.i_has_folder = 1 and dm_sysobject.i_is_deleted = 0) order by r_object_id ASC 300 commit

Previously Content Server was translating DQL query to SQL, but now it filters out redundant records on CS, not database, side. I believe the change in default behaviour is related to consistency between repeating attributes and amount of returned “records”, but let return to 2014:

RETURN_RANGE DQL queries – very limited syntax due to transformation to SQL. Why do not limit returned results at content server level?

🙂

Tricking installer

Last docker post inspired to me to write this one.
Well, imagine that I want to setup dev environment, but during installation I faced with dumb problems:

  • I want to install Oracle 11g, but the only publicly available version is 11.2.0.1 (or 11.2.0.2 if I want to install XE)
  • Documentum installer puts some restrictions on database version:

The solution is:

  • after you get error from installer find minimumDbVersions.ini file in /tmp directory
  • modify it to reflect actual DB version

UPD

Got asked two similar questions, the most thorough was:

got a VM running with CS 7.2 and Oracle XE (from my install notes):

  1. On the “enter license screen”
  2. drop to a shell
  3. fix /opt/documentum/product/7.2/bin/dmdbtest to contain only (w/o quotes): “exit 0”

then proceed the install

dmdbtest performs both version and connectivity checks, when you replace dmdbtest by your own implementation you either need to implement connectivity test or make sure that everything works well, “replace dmdbtest by “exit 0″ scenario” is not a “general-purpose recommendation”.

Dynamic and non-dynamic DFC preferences

Dynamic preferences are preferences which could be modified in runtime via modifying dfc.properties file, and these preferences are:

groovy:000> s = com.documentum.fc.common.DfPreferences.instance
groovy:000> for (int i=0;i<s.attrCount;i++) {name=s.getAttr(i).name;  if(s.getPreference(name).dynamic && !s.getPreference(name).deprecated) {println name}}
dfc.application_code
dfc.batch_hint_size
dfc.search.batch_hint_size
dfc.folder.hint.docbase
dfc.folder.hint.index
dfc.codepage
dfc.date_format
dfc.time_zone
dfc.verify_registration
dfc.bocs.avail.refresh_interval
dfc.acs.check_availability
dfc.acs.config.refresh_interval
dfc.acs.gr.refresh_interval
dfc.bocs.check.keep_number
dfc.cache.enable_persistence
dfc.cache.format.currency_check_interval
dfc.cache.store.currency_check_interval
dfc.cache.type.currency_check_interval
dfc.compatibility.truncate_long_values
dfc.compatibility.return_null_when_no_values
dfc.compatibility.preserve_session_info_messages
dfc.data.local_dir
dfc.content.castore.max_write_attempts
dfc.content.castore.write_sleep_interval
dfc.content.extern_store_content_static
dfc.diagnostics.exception.throw_warnings
dfc.diagnostics.exception.include_stack
dfc.diagnostics.preference.warn_about_rename
dfc.diagnostics.preference.warn_about_deprecation
dfc.diagnostics.warn_about_dmcl_call
dfc.docbroker.host
dfc.docbroker.port
dfc.docbroker.protocol
dfc.docbroker.service
dfc.docbroker.timeout
dfc.docbroker.exclude.failure_threshold
dfc.docbroker.exclusion.time
dfc.exception.include_decoration
dfc.exception.include_id
dfc.globalregistry.username
dfc.globalregistry.password
dfc.globalregistry.repository
dfc.globalregistry.domain
dfc.logging.level_to_force_stack
dfc.logging.verbose
dfc.machine.id
dfc.relationship.schema_expiration_interval
dfc.valueassistance.refresh.interval
dfc.search.external_sources.enable
dfc.search.external_sources.host
dfc.search.external_sources.password
dfc.search.external_sources.port
dfc.search.external_sources.ssl.enable
dfc.search.external_sources.ssl.keystore
dfc.search.external_sources.ssl.keystore_password
dfc.search.external_sources.request_timeout
dfc.search.external_sources.username
dfc.search.external_sources.backup.port
dfc.search.external_sources.backup.host
dfc.search.external_sources.retry.period
dfc.search.fulltext.enable
dfc.search.max_results
dfc.search.max_results_per_source
dfc.search.formatcache.refresh_interval
dfc.search.sourcecache.refresh_interval
dfc.search.typecache.refresh_interval
dfc.search.eos.mappingcache.refresh_interval
dfc.search.xquery.generation.enable
dfc.search.xquery.option.parallel_execution.enable
dfc.search.xquery.option.parallel_summary_computing.enable
dfc.search.matching_terms_computing.enable
dfc.dqlhints.file
dfc.searchranking.file
dfc.security.keystore.password
dfc.security.keystore.privatekey.password
dfc.session.dynamic_delay
dfc.connection.queuing_threshold
dfc.session.max_deadlock_retries
dfc.session.max_error_retries
dfc.session.max_connect_retries
dfc.session.pool.enable
dfc.session.pool.expiration_interval
dfc.connection.reuse_time_limit
dfc.session.reuse_limit
dfc.session.pool.mode
dfc.session.secure_connect_default
dfc.session.allow_trusted_login
dfc.session.pool.reauthenticate_interval
dfc.compatibility.allow_weak_disconnect
dfc.compatibility.useD7SessionPooling
dfc.session.surrogate.check_interval
dfc.session.surrogate.mode
dfc.session.load_balance_strategy
dfc.session.max_server_choice_age
dfc.storagepolicy.enable
dfc.tokenstorage.dir
dfc.tokenstorage.enable
dfc.session.keepalive.enable
dfc.tracing.enable
dfc.tracing.mode
dfc.tracing.verbose
dfc.tracing.file_creation_mode
dfc.tracing.timing_style
dfc.tracing.max_stack_depth
dfc.tracing.thread_name_filter
dfc.tracing.user_name_filter
dfc.tracing.method_name_filter
dfc.tracing.print_stack_on_method_match
dfc.tracing.include_rpcs
dfc.tracing.include_session_id
dfc.tracing.include_rpc_count
dfc.tracing.print_exception_stack
dfc.tracing.date_column_width
dfc.tracing.date_format
dfc.tracing.dir
dfc.tracing.log.category
dfc.tracing.log.level
dfc.tracing.log.additivity
dfc.tracing.file_prefix
dfc.tracing.file_override
dfc.tracing.max_backup_index
dfc.tracing.max_file_size
dfc.xml.encode_special_chars_in_attrs
dfc.xml.suppress_default_namespace_decl
dfc.ldap.bof.validate_password
dfc.lwo.allow_propagating_changes
dfc.lwo.log_materialize_stack
dfc.batchmanager.max_batch_size
dfc.storage.filestore_accelerator.enable
dfc.crypto.repository
dfc.connection.unused_connection_timeout
dfc.connection.reuse_threshold
dfc.connection.connection_reuse_matching_user_bias
dfc.connection.cleanup_check_window
dfc.connection.statistics_history_count
dfc.connection.profile_connections
dfc.connection.trace_connections_only
dfc.connection.queue_wait_time
dfc.connection.out_of_sessions_wait_time
dfc.license.cache_expiry_interval
dfc.license.server_reconnect_interval
dfc.license.metrics_log_frequency

in order to change non-dynamic preference you need to restart JVM, these preferences are:

groovy:000> s = com.documentum.fc.common.DfPreferences.instance
groovy:000> for (int i=0;i<s.attrCount;i++) {name=s.getAttr(i).name;  if(!s.getPreference(name).dynamic && !s.getPreference(name).deprecated) {println name}}
dfc.name
dfc.config.dir
dfc.config.file
dfc.data.dir
dfc.data.user_dir
dfc.data.checkout_dir
dfc.data.export_dir
dfc.data.local_diskfull_limit
dfc.data.local_diskfull_check_interval
dfc.data.local_clean_on_init
dfc.data.local_purge_on_diskfull
dfc.registry.mode
dfc.permit.grant.write.user_default_folder
dfc.registry.file
dfc.config.check_interval
dfc.query.object_name_for_docbase
dfc.query.should_include_object_name
dfc.locale
dfc.version
dfc.acs.avail.refresh_interval
dfc.acs.protocol.handler.class_name
dfc.acs.protocol.handler.protocol
dfc.acs.request.expiration_interval
dfc.admin.ldif_file_charset
dfc.appledouble.resource_file_ext
dfc.bocs.check.http_method
dfc.bof.cache.append_name
dfc.bof.cache.currency_check_interval
dfc.bof.cache.cleanup_interval
dfc.bof.cache.enable_preload
dfc.bof.classloader.enable_extension_loader_first
dfc.cache.dir
dfc.cache.write_interval
dfc.cache.ddinfo.size
dfc.cache.object.size
dfc.cache.query.size
dfc.cache.type.size
dfc.cache.currency_scoping.enable
dfc.cache.currency_scoping.query_result_max_count
dfc.data.umask
dfc.content.use_compression
dfc.content.use_content_server
dfc.diagnostics.resources.enable
dfc.docbroker.auto_request_forward
dfc.docbroker.search_order
dfc.docbroker.debug.docbase_id
dfc.docbroker.debug.host
dfc.docbroker.debug.port
dfc.docbroker.debug.service
dfc.kerberos.mapped_docbaseSPN
dfc.kerberos.docbaseName
dfc.globalregistry.connect_attempt_interval
dfc.object.id_batch_size
dfc.privilege.enable
dfc.privilege.use_anonymous_hostname
dfc.reference.binding_label
dfc.validation.allow.empty_string.list_complete
dfc.search.docbase.broker_count
dfc.search.external_sources.adapter.domain
dfc.search.external_sources.broker_count
dfc.search.external_sources.rmi_name
dfc.security.keystore.file
dfc.session.max_collection_count
dfc.session.max_count
dfc.session.connect_timeout
dfc.session.connect_retry_interval
dfc.session.global_pool_enabled
dfc.aspect.virtual_default_aspect_supported
dfc.storagepolicy.ignore_rule_errors
dfc.storagepolicy.validation_interval
dfc.validation.expr.currency_check_interval
dfc.validation.expr.debug.all
dfc.validation.expr.debug.eval
dfc.validation.expr.debug.code
dfc.validation.expr.debug.tree
dfc.validation.expr.disable_java
dfc.validation.expr.load_native_library
dfc.validation.overrides.currency_check_interval
dfc.validation.overrides.enable
dfc.vdm.max_child_flush_count
dfc.xml.retain_early_binding_to_current
dfc.xml.record_inline_descendants
dfc.xml.use_strict_uri
dfc.jmx.connector.port
dfc.jmx.connector.readonly.user
dfc.jmx.connector.readonly.password
dfc.jmx.connector.readwrite.user
dfc.jmx.connector.readwrite.password
dfc.jmx.mbeanserver.domain
dfc.jmx.mbean_server_provider
dfc.operations.customization.version
dfc.content.has_store_access
dfc.housekeeping.interval
dfc.housekeeping.viewed_file_retention_period
dfc.internal.purge_far_connections
dfc.license.core_thread_count
dfc.license.max_queue_size
dfc.license.max_cache_size
dfc.security.ssl.truststore
dfc.security.ssl.truststore_password
dfc.security.ssl.use_existing_truststore