Have never thought that my colleagues may teach me something…
Yesterday I asked my colleague, who is trying to improve his skills in performance optimisation, whether he had any idea how to improve this SQL statement:
SELECT ALL dm_folder.r_object_id
FROM dm_folder_sp dm_folder
WHERE ( EXISTS
WHERE dm_folder.r_object_id = r_object_id
AND r_folder_path = :"SYS_B_00")
AND (dm_folder.object_name = :"SYS_B_01"))
AND ( dm_folder.i_has_folder = :"SYS_B_02"
AND dm_folder.i_is_deleted = :"SYS_B_03")
and, surprisingly, the answer was: “Yes, I have seen something similar on support site – EMC suggest to set dfc.query.should_include_object_name and dfc.query.should_include_object_name properties, something like:
Well, as was expected both dfc.query.should_include_object_name and dfc.query.should_include_object_name properties are not documented, so let discuss the problem more thoroughly.
Imagine that we are maintaining following folder structure in our docbase:
i.e. for every client we create the same folder structure and when we want to store invoice for particular client we do something like:
the problem is that upon link call DFC calls IDfSession#getFolderByPath method to retrieve folder object with particular path, and inside IDfSession#getFolderByPath method DFC does following: it cuts off object name part from the path (i.e. everything after last ‘/’) and builds following DQL query:
SELECT r_object_id FROM dm_folder
AND ANY r_folder_path='/CLIENTS/CLINET_1/INVOICES'
such implementation is bit weird for two reasons:
- when I do the same I just write something like “retrieve,c,dm_folder where any r_folder_path=”…” and do not bother myself about object name
- Content Server has a build-in FolderIdFindByPath RPC command:
which generates following effective SQL statement:
select r_object_id from dm_folder_r where r_folder_path = :p0
so, I have no idea why DFC performs extra logic here, moreover, in case of current DFC implementation we are getting overcomplicated SQL query and, sometimes database engine fails to build a good execution plan for this query (this is caused by dumb recommendation to set CURSOR_SHARING database parameter to FORCE and depending on docbase structure execution of such query may take minutes). Below are two possible execution plans for this query:
good (dm_folder_r is a leading table – querying dm_folder_r table by r_folder_path will always return not more than one row):
and bad (dm_folder_r is not a leading table – imagine that we have 1 million clients and hence 1 million INVOICE folders, so querying dm_sysobjec_s table by object_name first will return 1 million records):
in case of “retrieve,c,dm_folder where any r_folder_path=”…” execution plan is always good:
In 2011 (if my memory serves me right), I solved such performance problem by marking index on dm_folder_r(r_folder_path) as unique – in this case database engine always builds the correct execution plan because it knows that querying dm_folder_r table will always return not more than one row, however in recent versions DFC it is possible to disable it’s dumb behaviour by setting dfc.query.should_include_object_name and dfc.query.should_include_object_name properties – can’t understand why this wasn’t enabled by default.