Q & A. XI

Hopefully, not too boring – if it is, please just reply “sorry, too boring” so I don’t get my hopes up.

I stumbled across this post, which leads me to believe you may have more insights to sessions management than most – Session management. Misconceptions

I was a bit disturbed by you comment about the white-paper – I was so excited when I found it, now you say it’s filled with misconceptions – yikes!

I have a Web ‘application’ that serves as a public site – no logins at all. When I first took over this site our Content Server was crashing constantly, running out of memory. Why…because the public application was creating a new session for each and every user of the site. The same content sever also supports a non-public login-based site, meaning every user has their own Documentum login. This application historically used a lot of CS memory for each session on the CS (not a 1-to-1 I know).

We started having an issue (out of nowhere) this week which severely impacted the public application. For the past 5-6 years this application has run supporting up-to 300-concurrent user off of ~2 client DFC sessions. We are able to do this because the majority of our requests are NOT served by DCTM, instead we use a full-text index (not Xplore). We finally had to take the entire site offline to try to figure out what the heck was going on.

A few pieces of info led us to believe that the issues were related to a slow storage unit backing one of our Filestores. With no traffic flowing to the app, one person tried to download a content file…while waiting for about 1-minute, another user hit another page that runs 3 simple DQL queries. From the application side of things, we saw no response on the page with the queries, UNTIL the content download had returned. This sure appears to be a threading/contention issue…

We have run a few test since, but haven’t yet nailed down anything. I can tell you that we use @Singleton to mark our application’s single session manager. That session manager provides a session via getSession() to each class that asks for one – regardless of http session or request. In my attempt to bandaid the issue, I switched the session request for the slow operating file download to newSession(). Not only did this seem to prevent other requests from waiting on the slow download, it also resulted in lightening fast downloads suddenly. I was concerned that this might cause too many sessions, as some people scrape our site for the content downloads – however so far we’ve only got up-to 15 sessions on the Content Servers from the ‘public’ user.

I am not super experienced with threading, but a quick play around in the code, adding a thread sleep after a getSession() is called, then opening a new browser tab and asking for another page did not show any waiting. This makes me think it’s something within the Session itself that has a threading issue?

Thanks for any thoughts you can share!

As regards your feelings: in my universe (not EMC’s) white-papers are not intended to fill documentation gaps, white-paper may be a good starting point for working with product/technology, but nothing else, and white-papers typically have information about authors (this gives reader a clue about author’s competence, EMC’s white-papers do not have such information, so I consider all EMC’s white-papers as a piece of shit just because some of them are wrong and misleading), product versions, purpose and target audience. But EMC instead of improving their documentation tries to fill documentation gaps by white-papers, the problem is white-papers are not documents and they do not provide a contract about product behaviour (documentation does).

As regards your problem…

When I first took over this site our Content Server was crashing constantly, running out of memory. Why…because the public application was creating a new session for each and every user of the site.

I believe you are talking about some ancient 32-bit content server builds for MS Windows. Indeed, there was such problem because on the one hand MS Windows has a limits on concurrent TCP connections, but these limits are far beyond the practical sense, on the other hand on MS Windows Content Server services client connections using threads within a single process, so the amount of concurrent session Content Server can create is also limited by amount of memory single process can consume – for 32-bit builds this limit is about 2Gb (I’m not a MS Windows expert, so, real values may differ, but the main idea is correct). Also, there is an interesting fact for UNIX builds. On UNIX Content Server services client connections using separate processes, so technically the amount of concurrent client connections is limited by amount of memory presented on CS host, but even in this case EMC put dumb limits on amount of concurrent sessions.

With no traffic flowing to the app, one person tried to download a content file…while waiting for about 1-minute, another user hit another page that runs 3 simple DQL queries. From the application side of things, we saw no response on the page with the queries, UNTIL the content download had returned. This sure appears to be a threading/contention issue…

I do think that either your diagnosis is wrong or you do not provide enough information or you have missed something (actually, taking a thread dump will provide you more clues than just analysing code). Let me explain. You have a hypothesis that “when we share DFC session among threads and and one thread performs transfer of content other threads are unable to execute DQL queries”, relying on the input data you have provided I can say without a doubt that this hypothesis is clearly untenable, the justification of my statement is: DFC performs content transfer from Content Server using 64K chunks, in order to request a chunk DFC sends RPC to content server, when DFC executes DQL queries it also sends RPCs to content server, RPC calls are serialised (you can’t send multiple RPC calls simultaneously) – there is a contention between threads but behaviour differs from yours. Below is a proof of concept – I have created a document of 1Gb worth and try to download it and perform some queries simultaneously, my expectation is if “synchronised” keyword in java provided fair scheduling my code would able to perform about 1024*1024/64/2=8192 (1024*1024/64 – is a number of RPCs required to transfer content, 2 – is a number of RPCs required to execute query) “simultaneous” queries:

public static void main(String[] args) throws Exception {
	final IDfClient client = new DfClientX().getLocalClient();
	client.getClientConfig().setString(DfPreferences.DFC_DOCBROKER_HOST, "192.168.0.253");
	final IDfSession session = client.newSession("DCTM_DEV", new DfLoginInfo("dmadmin", "dmadmin"));
	IDfSysObject sysObject = (IDfSysObject) session.getObject(DfId.valueOf("08024be980023d9d"));
	Thread t1 = new Thread(new Runnable() {
		public void run() {
			try {
				((IDfSysObject) session.getObject(DfId.valueOf("08024be980023d9d"))).getFile(null);
			} catch (DfException ex) {
				throw new RuntimeException(ex);
			}
		}
	});
	Thread t2 = new Thread(new Runnable() {
		public void run() {
			int count = 0;
			try {
				while (!Thread.currentThread().isInterrupted()) {
					IDfQuery q = new DfQuery("select count(*) from dm_server_config");
					q.execute(session, IDfQuery.DF_EXEC_QUERY).close();
					count++;
				}
			} catch (Exception ex) {
				// ignore
			}
			System.out.println(count);
		}
	});
	t1.start();
	t2.start();
	t1.join();
	t2.interrupt();
}

actual result is about 9000-10000 queries and corresponding thread dumps are:

"Thread-9" prio=5 tid=0x00007ff3739b6800 nid=0x5b07 waiting for monitor entry [0x000070000185e000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection$IteratorSynchronizer.getType(DocbaseConnection.java:1971)
	- waiting to lock <0x0000000703009670> (a com.documentum.fc.client.impl.connection.docbase.DocbaseConnection)
	at com.documentum.fc.client.impl.collection.TypedDataCollection.getTypedData(TypedDataCollection.java:39)
	at com.documentum.fc.client.impl.collection.TypedDataCollection.<init>(TypedDataCollection.java:46)
	at com.documentum.fc.client.impl.collection.TypedDataCollection.newInstance(TypedDataCollection.java:31)
	at com.documentum.fc.client.DfQuery.runQuery(DfQuery.java:167)
	at com.documentum.fc.client.DfQuery.execute(DfQuery.java:216)
	at QAXI$2.run(QAXI.java:50)
	at java.lang.Thread.run(Thread.java:745)

"Thread-8" prio=5 tid=0x00007ff3738ae800 nid=0x360b runnable [0x000070000175a000]
   java.lang.Thread.State: RUNNABLE
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.read(SocketInputStream.java:152)
	at java.net.SocketInputStream.read(SocketInputStream.java:122)
	at sun.security.ssl.InputRecord.readFully(InputRecord.java:442)
	at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:554)
	at sun.security.ssl.InputRecord.read(InputRecord.java:509)
	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:934)
	- locked <0x0000000703141a80> (a java.lang.Object)
	at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:891)
	at sun.security.ssl.AppInputStream.read(AppInputStream.java:102)
	- locked <0x000000070322d4c0> (a sun.security.ssl.AppInputStream)
	at com.documentum.fc.impl.util.io.MessageChannel.readSocket(MessageChannel.java:129)
	at com.documentum.fc.impl.util.io.MessageChannel.readLength(MessageChannel.java:100)
	at com.documentum.fc.impl.util.io.MessageChannel.getIncomingMessageLength(MessageChannel.java:92)
	at com.documentum.fc.impl.util.io.MessageChannel.read(MessageChannel.java:77)
	- locked <0x00000007031b50f0> (a com.documentum.fc.impl.util.io.MessageChannel)
	at com.documentum.fc.client.impl.connection.netwise.AbstractNetwiseRpcClient.receiveMessage(AbstractNetwiseRpcClient.java:183)
	at com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient.getBlock(NetwiseDocbaseRpcClient.java:1023)
	- locked <0x00000007031b5068> (a com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.getBlock(DocbaseConnection.java:1473)
	- locked <0x0000000703009670> (a com.documentum.fc.client.impl.connection.docbase.DocbaseConnection)
	at com.documentum.fc.client.impl.connection.docbase.RawPuller.getBlock(RawPuller.java:52)
	- locked <0x0000000703009670> (a com.documentum.fc.client.impl.connection.docbase.DocbaseConnection)
	at com.documentum.fc.client.content.impl.BlockPuller.nextBlock(BlockPuller.java:49)
	at com.documentum.fc.client.content.impl.PullerInputStream.getNextBuffer(PullerInputStream.java:73)
	at com.documentum.fc.client.content.impl.PullerInputStream.ensureBufferHasData(PullerInputStream.java:63)
	at com.documentum.fc.client.content.impl.PullerInputStream.read(PullerInputStream.java:88)
	at java.io.FilterInputStream.read(FilterInputStream.java:133)
	at java.io.FilterInputStream.read(FilterInputStream.java:133)
	at java.io.FilterInputStream.read(FilterInputStream.java:133)
	at java.io.FilterInputStream.read(FilterInputStream.java:107)
	at com.documentum.fc.impl.util.io.StreamUtility.copyContents(StreamUtility.java:50)
	at com.documentum.fc.impl.util.io.StreamUtility.copyContents(StreamUtility.java:30)
	at com.documentum.fc.client.content.impl.LocalContentFile.<init>(LocalContentFile.java:38)
	at com.documentum.fc.client.content.impl.LocalContentManager$LocalContentDirectory.createFile(LocalContentManager.java:480)
	at com.documentum.fc.client.content.impl.LocalContentManager$SessionLocalContentManager.createLocalContentFile(LocalContentManager.java:363)
	- locked <0x00000007cd220198> (a com.documentum.fc.client.content.impl.LocalContentManager$SessionLocalContentManager)
	at com.documentum.fc.client.content.impl.LocalContentManager.createContentFile(LocalContentManager.java:148)
	at com.documentum.fc.client.content.impl.ContentManager.namelessGetFile(ContentManager.java:253)
	at com.documentum.fc.client.content.impl.ContentManager.getFile(ContentManager.java:198)
	at com.documentum.fc.client.content.impl.ContentManager.getFile(ContentManager.java:173)
	at com.documentum.fc.client.DfSysObject.getFileEx2(DfSysObject.java:1972)
	- locked <0x000000070310dfe0> (a com.documentum.fc.client.DfSysObject)
	at com.documentum.fc.client.DfSysObject.getFileEx(DfSysObject.java:1964)
	at com.documentum.fc.client.DfSysObject.getFile(DfSysObject.java:1959)
	at com.documentum.fc.client.DfSysObject___PROXY.getFile(DfSysObject___PROXY.java)
	at QAXI$1.run(QAXI.java:38)
	at java.lang.Thread.run(Thread.java:745)

or:

"Thread-9" prio=5 tid=0x00007fc9f322f000 nid=0x5b0f runnable [0x000070000185e000]
   java.lang.Thread.State: RUNNABLE
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.read(SocketInputStream.java:152)
	at java.net.SocketInputStream.read(SocketInputStream.java:122)
	at sun.security.ssl.InputRecord.readFully(InputRecord.java:442)
	at sun.security.ssl.InputRecord.read(InputRecord.java:480)
	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:934)
	- locked <0x00000007b2bb45b0> (a java.lang.Object)
	at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:891)
	at sun.security.ssl.AppInputStream.read(AppInputStream.java:102)
	- locked <0x00000007b2bb4660> (a sun.security.ssl.AppInputStream)
	at com.documentum.fc.impl.util.io.MessageChannel.readSocket(MessageChannel.java:129)
	at com.documentum.fc.impl.util.io.MessageChannel.readLength(MessageChannel.java:100)
	at com.documentum.fc.impl.util.io.MessageChannel.getIncomingMessageLength(MessageChannel.java:92)
	at com.documentum.fc.impl.util.io.MessageChannel.read(MessageChannel.java:77)
	- locked <0x00000007b2bb4690> (a com.documentum.fc.impl.util.io.MessageChannel)
	at com.documentum.fc.client.impl.connection.netwise.AbstractNetwiseRpcClient.receiveMessage(AbstractNetwiseRpcClient.java:183)
	at com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient.closeCollection(NetwiseDocbaseRpcClient.java:934)
	- locked <0x00000007b2bb4710> (a com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient)
	at com.documentum.fc.client.impl.connection.docbase.netwise.NetwiseDocbaseRpcClient$TypedDataIterator.close(NetwiseDocbaseRpcClient.java:1366)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection$IteratorSynchronizer.close(DocbaseConnection.java:1963)
	- locked <0x00000007b2bb4790> (a com.documentum.fc.client.impl.connection.docbase.DocbaseConnection)
	at com.documentum.fc.client.impl.collection.TypedDataCollection.close(TypedDataCollection.java:54)
	- locked <0x00000007b10120d0> (a com.documentum.fc.client.impl.collection.TypedDataCollection)
	at com.documentum.fc.client.impl.collection.CollectionHandle.close(CollectionHandle.java:42)
	at QAXI$2.run(QAXI.java:50)
	at java.lang.Thread.run(Thread.java:745)

"Thread-8" prio=5 tid=0x00007fc9f322e000 nid=0x320b waiting for monitor entry [0x000070000175a000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.documentum.fc.client.impl.connection.docbase.RawPuller.getBlock(RawPuller.java:52)
	- waiting to lock <0x00000007b2bb4790> (a com.documentum.fc.client.impl.connection.docbase.DocbaseConnection)
	at com.documentum.fc.client.content.impl.BlockPuller.nextBlock(BlockPuller.java:49)
	at com.documentum.fc.client.content.impl.PullerInputStream.getNextBuffer(PullerInputStream.java:73)
	at com.documentum.fc.client.content.impl.PullerInputStream.ensureBufferHasData(PullerInputStream.java:63)
	at com.documentum.fc.client.content.impl.PullerInputStream.read(PullerInputStream.java:88)
	at java.io.FilterInputStream.read(FilterInputStream.java:133)
	at java.io.FilterInputStream.read(FilterInputStream.java:133)
	at java.io.FilterInputStream.read(FilterInputStream.java:133)
	at java.io.FilterInputStream.read(FilterInputStream.java:107)
	at com.documentum.fc.impl.util.io.StreamUtility.copyContents(StreamUtility.java:50)
	at com.documentum.fc.impl.util.io.StreamUtility.copyContents(StreamUtility.java:30)
	at com.documentum.fc.client.content.impl.LocalContentFile.<init>(LocalContentFile.java:38)
	at com.documentum.fc.client.content.impl.LocalContentManager$LocalContentDirectory.createFile(LocalContentManager.java:480)
	at com.documentum.fc.client.content.impl.LocalContentManager$SessionLocalContentManager.createLocalContentFile(LocalContentManager.java:363)
	- locked <0x00000007abec1350> (a com.documentum.fc.client.content.impl.LocalContentManager$SessionLocalContentManager)
	at com.documentum.fc.client.content.impl.LocalContentManager.createContentFile(LocalContentManager.java:148)
	at com.documentum.fc.client.content.impl.ContentManager.namelessGetFile(ContentManager.java:253)
	at com.documentum.fc.client.content.impl.ContentManager.getFile(ContentManager.java:198)
	at com.documentum.fc.client.content.impl.ContentManager.getFile(ContentManager.java:173)
	at com.documentum.fc.client.DfSysObject.getFileEx2(DfSysObject.java:1972)
	- locked <0x00000007b2b8e5b8> (a com.documentum.fc.client.DfSysObject)
	at com.documentum.fc.client.DfSysObject.getFileEx(DfSysObject.java:1964)
	at com.documentum.fc.client.DfSysObject.getFile(DfSysObject.java:1959)
	at com.documentum.fc.client.DfSysObject___PROXY.getFile(DfSysObject___PROXY.java)
	at QAXI$1.run(QAXI.java:38)
	at java.lang.Thread.run(Thread.java:745)

Note, if I introduce transactions in my PoC I will get a behaviour you have described (I do not want to say that using transactions is a clue for your problem, just start your troubleshooting with taking thread dump):

public static void main(String[] args) throws Exception {
	final IDfClient client = new DfClientX().getLocalClient();
	client.getClientConfig().setString(DfPreferences.DFC_DOCBROKER_HOST, "192.168.0.253");
	final IDfSession session = client.newSession("DCTM_DEV", new DfLoginInfo("dmadmin", "dmadmin"));
	IDfSysObject sysObject = (IDfSysObject) session.getObject(DfId.valueOf("08024be980023d9d"));
	Thread t1 = new Thread(new Runnable() {
		public void run() {
			try {
				session.beginTrans();
				((IDfSysObject) session.getObject(DfId.valueOf("08024be980023d9d"))).getFile(null);
			} catch (DfException ex) {
				throw new RuntimeException(ex);
			} finally {
				try {
					session.abortTrans();
				} catch (DfException ex) {
					// ignore
				}
			}
		}
	});
	Thread t2 = new Thread(new Runnable() {
		public void run() {
			int count = 0;
			try {
				while (!Thread.currentThread().isInterrupted()) {
					IDfQuery q = new DfQuery("select count(*) from dm_server_config");
					q.execute(session, IDfQuery.DF_EXEC_QUERY).close();
					count++;
				}
			} catch (Exception ex) {
				// ignore
			}
			System.out.println(count);
		}
	});
	t1.start();
	t2.start();
	t1.join();
	t2.interrupt();
}

result is 2-3 (actually should be 1 but writing a full PoC is bit boring) “simultaneous” queries and thread dump looks like:

"Thread-9" prio=5 tid=0x00007fb0c2a91800 nid=0x5d07 in Object.wait() [0x000070000185e000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00000007bae39970> (a com.documentum.fc.client.impl.connection.docbase.DocbaseConnection)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.waitForCorrectSharingContext(DocbaseConnection.java:820)
	- locked <0x00000007bae39970> (a com.documentum.fc.client.impl.connection.docbase.DocbaseConnection)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.evaluateRpc(DocbaseConnection.java:1108)
	- locked <0x00000007bae39970> (a com.documentum.fc.client.impl.connection.docbase.DocbaseConnection)
	at com.documentum.fc.client.impl.connection.docbase.DocbaseConnection.applyForCollection(DocbaseConnection.java:1265)
	at com.documentum.fc.client.impl.docbase.DocbaseApi.exec(DocbaseApi.java:83)
	at com.documentum.fc.client.impl.session.Session.query(Session.java:3630)
	at com.documentum.fc.client.impl.session.SessionHandle.query(SessionHandle.java:2322)
	at com.documentum.fc.client.DfQuery.runQuery(DfQuery.java:167)
	at com.documentum.fc.client.DfQuery.execute(DfQuery.java:216)
	at QAXI$2.run(QAXI.java:57)
	at java.lang.Thread.run(Thread.java:745)

"Thread-8" prio=5 tid=0x00007fb0c3887800 nid=0x1307 runnable [0x000070000175a000]
   java.lang.Thread.State: RUNNABLE
	at java.io.FileOutputStream.writeBytes(Native Method)
	at java.io.FileOutputStream.write(FileOutputStream.java:345)
	at com.documentum.fc.impl.util.io.StreamUtility.copyContents(StreamUtility.java:51)
	at com.documentum.fc.impl.util.io.StreamUtility.copyContents(StreamUtility.java:30)
	at com.documentum.fc.client.content.impl.LocalContentFile.<init>(LocalContentFile.java:38)
	at com.documentum.fc.client.content.impl.LocalContentManager$LocalContentDirectory.createFile(LocalContentManager.java:480)
	at com.documentum.fc.client.content.impl.LocalContentManager$SessionLocalContentManager.createLocalContentFile(LocalContentManager.java:363)
	- locked <0x00000007bae93480> (a com.documentum.fc.client.content.impl.LocalContentManager$SessionLocalContentManager)
	at com.documentum.fc.client.content.impl.LocalContentManager.createContentFile(LocalContentManager.java:148)
	at com.documentum.fc.client.content.impl.ContentManager.namelessGetFile(ContentManager.java:253)
	at com.documentum.fc.client.content.impl.ContentManager.getFile(ContentManager.java:198)
	at com.documentum.fc.client.content.impl.ContentManager.getFile(ContentManager.java:173)
	at com.documentum.fc.client.DfSysObject.getFileEx2(DfSysObject.java:1972)
	- locked <0x00000007bb011c60> (a com.documentum.fc.client.DfSysObject)
	at com.documentum.fc.client.DfSysObject.getFileEx(DfSysObject.java:1964)
	at com.documentum.fc.client.DfSysObject.getFile(DfSysObject.java:1959)
	at com.documentum.fc.client.DfSysObject___PROXY.getFile(DfSysObject___PROXY.java)
	at QAXI$1.run(QAXI.java:39)
	at java.lang.Thread.run(Thread.java:745)

I can tell you that we use @Singleton to mark our application’s single session manager. That session manager provides a session via getSession() to each class that asks for one – regardless of http session or request. In my attempt to bandaid the issue, I switched the session request for the slow operating file download to newSession(). Not only did this seem to prevent other requests from waiting on the slow download, it also resulted in lightening fast downloads suddenly. I was concerned that this might cause too many sessions, as some people scrape our site for the content downloads – however so far we’ve only got up-to 15 sessions on the Content Servers from the ‘public’ user.

Sharing sessions/session managers among thread is not safe, using newSession() is not safe too – please check my explanation on ECN. If you want to find a balance between performance and amount of content server sessions consider to implement a pool of session managers – you can simply adopt Commons Pool. Also, check how you transfer content from Content Server. I also may suggest to use ACS for content transfer operations but this will complicate your solution and most probably will have no benefits due to following reasons:

  • you will start depending on JMS, so you will need a backup plan for handling JMS availability issues
  • when creating ACS link Content Server generates SHA1 checksum for content – if your storage is slow it may be an issue because the content will be read twice: one time for generating SHA1 checksum, second time when sending content
  • never ever use this pattern for getting ACS urls – if ACS is unavailable IDfExportOperation transfers content to DFC host, use com.documentum.fc.client.IDfSysObject#getAcsRequests instead

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s