Have never thought that my colleagues may teach me something…

Yesterday I asked my colleague, who is trying to improve his skills in performance optimisation, whether he had any idea how to improve this SQL statement:

SELECT ALL dm_folder.r_object_id
  FROM dm_folder_sp dm_folder
 WHERE     (    EXISTS
                   (SELECT r_object_id
                      FROM dm_folder_r
                     WHERE     dm_folder.r_object_id = r_object_id
                           AND r_folder_path = :"SYS_B_00")
            AND (dm_folder.object_name = :"SYS_B_01"))
       AND (    dm_folder.i_has_folder = :"SYS_B_02"
            AND dm_folder.i_is_deleted = :"SYS_B_03")

and, surprisingly, the answer was: “Yes, I have seen something similar on support site – EMC suggest to set dfc.query.should_include_object_name and dfc.query.should_include_object_name properties, something like:


Well, as was expected both dfc.query.should_include_object_name and dfc.query.should_include_object_name properties are not documented, so let discuss the problem more thoroughly.

Imagine that we are maintaining following folder structure in our docbase:


i.e. for every client we create the same folder structure and when we want to store invoice for particular client we do something like:


the problem is that upon link call DFC calls IDfSession#getFolderByPath method to retrieve folder object with particular path, and inside IDfSession#getFolderByPath method DFC does following: it cuts off object name part from the path (i.e. everything after last ‘/’) and builds following DQL query:

SELECT r_object_id FROM dm_folder 
WHERE object_name='INVOICES' 

such implementation is bit weird for two reasons:

  • when I do the same I just write something like “retrieve,c,dm_folder where any r_folder_path=”…” and do not bother myself about object name
  • Content Server has a build-in FolderIdFindByPath RPC command:
    API> apply,c,,FolderIdFindByPath,_FOLDER_PATH_,S,/dmadmin
    API> next,c,q0
    API> get,c,q0,result
    API> close,c,q0

    which generates following effective SQL statement:

    select r_object_id from dm_folder_r where r_folder_path = :p0

so, I have no idea why DFC performs extra logic here, moreover, in case of current DFC implementation we are getting overcomplicated SQL query and, sometimes database engine fails to build a good execution plan for this query (this is caused by dumb recommendation to set CURSOR_SHARING database parameter to FORCE and depending on docbase structure execution of such query may take minutes). Below are two possible execution plans for this query:

good (dm_folder_r is a leading table – querying dm_folder_r table by r_folder_path will always return not more than one row):

and bad (dm_folder_r is not a leading table – imagine that we have 1 million clients and hence 1 million INVOICE folders, so querying dm_sysobjec_s table by object_name first will return 1 million records):

in case of “retrieve,c,dm_folder where any r_folder_path=”…” execution plan is always good:

In 2011 (if my memory serves me right), I solved such performance problem by marking index on dm_folder_r(r_folder_path) as unique – in this case database engine always builds the correct execution plan because it knows that querying dm_folder_r table will always return not more than one row, however in recent versions DFC it is possible to disable it’s dumb behaviour by setting dfc.query.should_include_object_name and dfc.query.should_include_object_name properties – can’t understand why this wasn’t enabled by default.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s