DFC vs Memory

java.lang.String class in Java prior to 7 version has an extremely controversial implementation – all strings are backed by character array and all strings, which originated from the same string using substring() method, are backed by the original character array:

 ~]$ groovysh
Groovy Shell (2.4.1, JVM: 1.6.0_45)
Type ':help' or ':h' for help.
-----------------------------------------------------
groovy:000> s="teststring"
===> teststring
groovy:000> s.dump()
===> <java.lang.String@9d966f23 value=teststring offset=0 count=10 hash=-1651085533>
groovy:000> s1=s.substring(0,2)
===> te
groovy:000> s2=s.substring(2,2)
===>
groovy:000> s1.dump()
// value is the same as in original string, by offset and count differ
===> <java.lang.String@e71 value=teststring offset=0 count=2 hash=3697>
groovy:000> s2.dump()
===> <java.lang.String@0 value=teststring offset=2 count=0 hash=0>

but:

// now we create a "brand new" string
groovy:000> s3=new String(s1)
===> te
groovy:000> s3.dump()
===> <java.lang.String@e71 value=te offset=0 count=2 hash=3697>

In practice this means that if you are going to store strings in memory for a long period of time it might be a good idea to create a “brand new” string to reduce memory usage. How is it related to DFC? DFC has a really weird implementation – it creates “brand new” strings only for values of string attributes:

groovy:000> import com.documentum.com.*
===> com.documentum.com.*
groovy:000> import com.documentum.fc.common.*
===> com.documentum.com.*, com.documentum.fc.common.*
groovy:000> li = new DfLoginInfo("dmadmin", "dmadmin")
===> DfLoginInfo{user=dmadmin, forceAuth=true}
groovy:000> s = new DfClientX().getLocalClient().newSession("ssc_dev", li)
===> com.documentum.fc.client.impl.session.StrongSessionHandle@56092666
groovy:000> d = s.getObjectByQualification("dm_server_config")
===> PROXY@221a5d08[DfSysObject@1c6250d2[....
groovy:000> d.getObjectId().getId().value.length
===> 2782
groovy:000> d.getObjectId().getId().value
===> 2
dm_server_config
3d01ffd780000102 0

OBJ dm_server_config 0 0 0 158
B S 2 A 7 ssc_dev
C S 2 A 16 dm_server_config
D S 2 A 0
E S 2 A 0
F R 2 0
........
groovy:000> d.getObjectName().value.length
===> 7
groovy:000> d.getObjectName().value
===> ssc_dev
groovy:000> d.getFolderId(0).getId().value.length
===> 2782
groovy:000> d.getString("r_object_id").value.length
===> 2782
groovy:000> d.getString("object_name").value.length
===> 7

So, if you are going to store ids in memory you can get a really weird behaviour: you may expect that string with object identifier consumes just 72 bytes (24 bytes for String class (4 bytes for the char array reference, plus 3*4=12 bytes for the three int fields (offset, count and hash), plus 8 bytes of object header) and 48 bytes for character array (12 bytes of header plus 16*2=32 bytes for the sixteen characters)), but in practice it will consume a lot more memory.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s