r_object_id. Type tag

Common structure

[5f][024be9][800390f2]
  │     │       │
  │     │       └─── object id part 
  │     └─── docbase id part
  └──── type tag part  

Type tag

Possible values of type tag are described here
What is the purpose of type tag? Initially I thought that it is specially introduced to improve performance but now I doubt that my thoughts were right. Let me explain. If you read What happens when smart guy does not sit back blogpost, you should know that when Content Server tries to get object’s snapshot from database, CS (in general) performs two SQL queries:

-- determine object type by r_object_id
SELECT dm_dbalias_B.I_TYPE
  FROM DMI_OBJECT_TYPE dm_dbalias_B
 WHERE dm_dbalias_B.R_OBJECT_ID = :dmb_handle
  
-- retrieve data from database
  SELECT *
    FROM TYPE_RV dm_dbalias_B, TYPE_SV dm_dbalias_C
   WHERE (    dm_dbalias_C.R_OBJECT_ID = :dmb_handle
          AND dm_dbalias_C.R_OBJECT_ID = dm_dbalias_B.R_OBJECT_ID)
ORDER BY dm_dbalias_B.R_OBJECT_ID, dm_dbalias_B.I_POSITION

The first query intended to determine object’s type and the second one retrieves data from database. And the question is “Do we need to execute first query to determine object’s type or not?”. And from my perspective the answer is: in some cases it is possible to avoid first select. What are these cases? It is obvious that if object type has a predefined type tag and it is not possible to subtype this object type we do not need to perform select against dmi_object_type table – we already know what tables we should use to retrieve data from database. And indeed, for some cases Content Server follows this concept, and we may check these cases manually: just try to retrieve non-existing object and check what was the last SQL query, for example:

groovy:000> // non-existing acl
===> true
groovy:000> fetch,c,45024be980ffffff
ERROR com.documentum.fc.client.DfIdNotFoundException:
[DM_API_E_EXIST]error:  
 "Document/object specified by 45024be980ffffff does not exist."
groovy:000> ?,c,exec get_last_sql
USER ATTRIBUTES

  result                          : 
SELECT * FROM DM_ACL_S dm_dbalias_B  WHERE dm_dbalias_B.R_OBJECT_ID=:dmb_handle	
 PARAMS :dmb_handle=45024be980ffffff 

but:

groovy:000> // non-existing group
===> true
groovy:000> fetch,c,12024be980ffffff
ERROR com.documentum.fc.client.DfIdNotFoundException:
[DM_API_E_EXIST]error:  
 "Document/object specified by 12024be980ffffff does not exist."
groovy:000> ?,c,exec get_last_sql
USER ATTRIBUTES

  result                          : 
SELECT dm_dbalias_B.I_TYPE FROM DCTM_DEV.DMI_OBJECT_TYPE dm_dbalias_B 
 WHERE dm_dbalias_B.R_OBJECT_ID=:dmb_handle PARAMS :dmb_handle=12024be980ffffff 

groovy:000> ?,c,create type custom_group with supertype dm_group
ERROR com.documentum.fc.common.DfException:
[DM_QUERY_E_BAD_SUPERTYPE]error:  
 "CREATE TYPE:  A non-subtypeable type was specified as supertype."

So, Content Server does know that dm_group is non-subtypeable object type and it has predefined type tag, but still tries to use dmy_object_type table for lookups. Why does the behaviour differ for dm_acl and dm_group types? I believe that the clue is a value of unique_tag attribute in dm_type:

API> ?,c,select name,unique_tag from dm_type where unique_tag<>0
name                         unique_tag
---------------------------  ------------
dmi_subcontent                        100
dm_dump_record                         47
dmi_dump_object_record                 48
dm_load_record                         49
dmi_load_object_record                 50
dmi_queue_item                         27
dmi_package                            73
dmi_session                             1
dmi_transactionlog                     33
dmi_recovery                           72
dm_docbaseid_map                       68
dm_assembly                            13
dmi_dd_type_info                      105
dmi_registry                           38
dm_acl                                 69
dmr_containment                         5
dmi_dd_attr_info                      106
dmi_replica_record                     45
dmi_dist_comp_record                   54
dmr_content                             6
dm_reference                           71
dmi_workitem                           74
dm_partition_scheme                   111
(23 rows affected)

However, this list looks bit strange (let pretend that absence of dm_group is just a mistake)… On the one hand, tell me who the heck is going to subtype following types: dm_process, dm_activity, dm_workflow, dm_server_config, dm_docbase_config, dm_type, dmi_type_info, dm_method, dm_registered, dmi_index, dmi_sequence, dmi_vstamp, dm_policy, etc? Why do not make this types non-subtypeable and improve performance? On the other hand, there is an obvious design gap related to implementation of dm_audittrail – refresh your memory, what is the most problem/fastest growing/largest table in Documentum database? dm_audittrail_s. Below are other performance issues related to dm_audittrail:

  • writing audit is extremely slow – disabling audit improves ingestion rates by 40%, and inserting into dmi_object_type table, obviously, takes a significant part of these 40%
  • presence of audit “records” in dmi_object_type table obviously slowdowns everything
  • dm_audittrail is not partition-capable – EMC sells HVS option for content server, but this options is useless against dm_audittrail
  • deleting dm_audittrail is slow too, you need to use Documentum API to delete these records

But the “funny” thing is EMC does not know that dm_audittrail_s table exists, for example, check their sizing spreadshit – EMC’s estimations do no include dm_audittrail, sizing spreadshit contains tenfold error:

Unfortunately, Content Server is not the only product which improperly uses type tags. Have you ever thought why webtop’s browser tree is extremely slow in expanding folder nodes, especially if current folder contains a lot of documents? Because to get a list of subfolders webtop relies on relation model and performs something like:

SELECT UPPER(object_name), r_object_id, object_name,
 r_object_type, r_link_cnt, r_lock_owner
FROM dm_folder WHERE a_is_hidden=FALSE 
 AND (i_is_reference IS NULL OR i_is_reference = 0) 
 AND FOLDER(ID('<folder_id>'))
ORDER BY 1

instead of taking into account the fact that folder identifiers start with ‘0b’ and executing something like:

SELECT UPPER(object_name), r_object_id, object_name,
 r_object_type, r_link_cnt, r_lock_owner
FROM dm_folder WHERE a_is_hidden=FALSE 
 AND (i_is_reference IS NULL OR i_is_reference = 0) 
 AND FOLDER(ID('<folder_id>'))
 AND r_object_id LIKE '0b%'
ORDER BY 1

Another example. Webtop’s com.documentum.web.form.control.databound.DFCQueryDataHandler (base class for the most of grids) does something weird in getTypeNameForCurrentRow method:

private String getTypeNameForCurrentRow() {
	if (hasColumn("r_object_type")) {
		return getDataField("r_object_type");
	}
	String objectId = getDataField("r_object_id");
	IDfPersistentObject object = getSession().getObject(DfId.valueOf(objectId));
	return object.getType().getName();
}

2 thoughts on “r_object_id. Type tag

  1. Again very nice observation.
    Do you measure the performance win to include AND r_object_id LIKE ‘0b%’ for folders with more than 500 documents?

    Thanks,
    Jens

    Like

  2. It depends on distribution of data, for “classical” execution plan, i.e.:

    -------------------------------------------------------------
    | Id  | Operation                      | Name               |
    -------------------------------------------------------------
    |   0 | SELECT STATEMENT               |                    |
    |   1 |  SORT AGGREGATE                |                    |
    |   2 |   NESTED LOOPS                 |                    |
    |   3 |    NESTED LOOPS                |                    |
    |*  4 |     INDEX RANGE SCAN           | D_1F024BE980000010 |
    |*  5 |     TABLE ACCESS BY INDEX ROWID| DM_SYSOBJECT_S     |
    |*  6 |      INDEX UNIQUE SCAN         | D_1F024BE98000010B |
    |*  7 |    INDEX UNIQUE SCAN           | D_1F024BE980000145 |
    -------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       4 - access("DM_SYSOBJECT_R2"."I_FOLDER_ID"='0c024be980000105')
       5 - filter("XB_"."I_IS_DELETED"=0 AND "XB_"."I_HAS_FOLDER"=1)
       6 - access("XB_"."R_OBJECT_ID"="DM_SYSOBJECT_R2"."R_OBJECT_ID")
       7 - access("XB_"."R_OBJECT_ID"="OG_"."R_OBJECT_ID")
    

    VS

    -------------------------------------------------------------
    | Id  | Operation                      | Name               |
    -------------------------------------------------------------
    |   0 | SELECT STATEMENT               |                    |
    |   1 |  SORT AGGREGATE                |                    |
    |   2 |   NESTED LOOPS                 |                    |
    |   3 |    NESTED LOOPS                |                    |
    |*  4 |     INDEX RANGE SCAN           | D_1F024BE980000010 |
    |*  5 |     TABLE ACCESS BY INDEX ROWID| DM_SYSOBJECT_S     |
    |*  6 |      INDEX UNIQUE SCAN         | D_1F024BE98000010B |
    |*  7 |    INDEX UNIQUE SCAN           | D_1F024BE980000145 |
    -------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       4 - access("DM_SYSOBJECT_R2"."I_FOLDER_ID"='0c024be980000105' AND
                  "DM_SYSOBJECT_R2"."R_OBJECT_ID" LIKE '0b%')
           filter("DM_SYSOBJECT_R2"."R_OBJECT_ID" LIKE '0b%')
       5 - filter("XB_"."I_HAS_FOLDER"=1 AND "XB_"."I_IS_DELETED"=0)
       6 - access("XB_"."R_OBJECT_ID"="DM_SYSOBJECT_R2"."R_OBJECT_ID")
           filter("XB_"."R_OBJECT_ID" LIKE '0b%')
       7 - access("XB_"."R_OBJECT_ID"="OG_"."R_OBJECT_ID")
           filter("OG_"."R_OBJECT_ID" LIKE '0b%')
    

    DEV database performs one extra read per non-folder object, but PROD database performs 5 extra reads per non-folder object.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s