Audit Trails and Impact on Documentum Performance

January 28, 2009

From my experience, we have to be careful about turning on audit trails in Documentum.  Turning on tracking for unnecessary events could lead to excessive audit trail creation in the database and will slow down the system.

For example, audit can be turned on for Sysobjects on several events. Two of these events are a bit confusing.

1) dm_getfile and 2) dm_fetch

While these two events sound similar at the first glance, their function is different.
1) dm_getfile occurs when a document is viewed or exported.  This is true “reading” of a document. You will want to turn this on to track which user downloaded/read this document.
2) dm_fetch occurs when webtop gets the documents attributes from the content-server.  This can occur when you perform a search or just browse to a folder.   This means that if you browse to a folder with 50 documents, it could potentially create 5o new audit trail records in the database!!  This is unnecessary, unless you have a strict security policy.

Watch out for this.  In our system, we turned on auditing for dm_fetch and ended up creating 8 million useless records and consuming all the database tablespaces.  After we realised what was happening, we turned off the auditing for dm_fetch and purged the unnecessary records, our system performance improved dramatically.

There are some excellent posts on EMC forums regarding this.  Do check them out.


View a Custom Attribute Value Assistance Fixed List using IAPI

January 12, 2009
sp_doc is the custom doctype and sp_indus is the attribute with a value assistance.
Enter the following command in the API Tester.  NOTE the “t” before the custom_type
dump,c,tsp_doc.sp_indus
map_display_string        [0]: Accountancy / Business, Finance & Law
                          [1]: Built Environment
                          [2]: Chemical & Life Sciences & Other Sciences
                          [3]: Engineering - Electrical & Electronics
                          [4]: Engineering - Mechanical / Manufacturing
                          [5]: Information Technology
                          [6]: Maritime Studies
                          [7]: Media & Design
  map_data_string         [0]: Indus01
                          [1]: Indus02
                          [2]: Indus03
                          [3]: Indus04
                          [4]: Indus05
                          [5]: Indus06
                          [6]: Indus07
                          [7]: Indus08
  map_description         [0]:
                          [1]:
                          [2]:
                          [3]:
                          [4]:
                          [5]:
                          [6]:
                          [7]:
  attr_name                  : sp_indus

Writing Performance/Load Testing Requirements

January 12, 2009

We are  in the middle of doing Performance testing on the project we are implementing with Documentum D6.  During the process I learnt that the requirements specs should be drafted very carefully when it comes to performance testing.   Here are some key points that may help you:

Define what the following terms mean in the context of  performance testing:
Concurrent users
Possible confusion here – Does it mean 500 users logged into the server and maintaining an idle session? OR 500 users logged in and performing active transactions?
Clearly specify that the performance testing shall be done with a sustained load of 500 concurrent users performing active transactions. There should not be any idle time between transactions except for the specified think-times.

Think-time

Think time is the time when the user is supposed to be reading documents or doing something that does not actively stress the system.
Obviously, using zero think time during performance testing will unduly load the system and is not realistic.  On the other hand, if you choose a think time that is too long, the system will not be tested correctly.
The think times chosen should be realistic.  Typically, 5-6 seconds between requests is a good start (since the user will get used to an application he uses frequently).  If you want to be accurate, time the actions of a few users which using a pilot/demo system.
If you are writing requirements specs you MUST specify a think time that you need your vendor to comply with.  Leaving this open is a sure way to arguments in future.

Response Time
Is this to be measured from user PC to server?  OR is it the time taken for the server to return a response to the load testing tool?
Usually performance testing is done using a tool like LoadRunner or Rational Performance Tester.   The tool will be setup on a test server(s) and run to generate the user load and perform transactions.  The are two problems to take note of here:
- The tool will send and receive data at a network level (TCP).   So the time taken by a typical browser to render the HTML, Javascript, etc is not recorded.
- The end-users will probably be accessing the application from different network locations.   The response time will depend on the network latency.
It is not possible to measure the network latency and browser render time accurately.  So you will need to make an assumption of this latency – say 4 seconds.
Finally, if you think your users can accept a response time of 8 seconds for a transaction, you will have to specify that using a tool like LoadRunner, the expected response time for the transation is 8-4 = 4 seconds.

Ramp-up period
Specify that 500 users shall be logged into the system within a period of 10 minutes.
You have two options after the user logs in.  1) Continue with the next transaction and logout after completion. OR 2) Login and wait until the load of 500 users has been attained. and only then start performing transactions.    Specify which of the above options you prefer.

Sustained user load
You must specify clearly that during the test period, the number of users logged into the system will be sustained at a level of 500 users. Users that complete transactions and logout will be replaced by logging in new users.  Otherwise, the system will be loaded initially but the user load will fall rapidly as the simulated users logout.

Caching
You cannot avoid caching in an enterprise system.
You should specify that caching caused by running the same search queries or downloading the same document should be eliminated. One way to achieve this is by randomly picking keywords or documents to download.
You can also use pacing creatively to avoid logging in the same user at short intervals.

Response Time measured at 95% (or 99%) Percentile Value
Don’t use Average response time as a measure of acceptable response times.  The mathematical average tends to mask the actual response times.   An accepted measure is the 95th percentile value.
If you have 100 transactions, sort all the response times in ascending order, the value at the 95th position is the 95th percentile value.  In other words, 95% of the users will see a response time faster than or equal to the 95th percentile value.
However, don’t go by this number alone.  Plot a graph that shows (Response-time) Vs (Total No of users).  This will show you if the large number of users are clustered at fast response times or towards the longer response times.

Stable Test Runs
When you conduct performance testing at high loads, some of the transactions will fail.  This could be due to hung threads, lack of server resources or other factors.   However, if a lot of transactions fail, then perhaps the system is not stable enough.   Depending on your requirement, you should specify that a test run is considered stable and valid for analysis if a suitable percentage (say 95%) of the transactions succeed.

It is also advisable to investigate why the 5% of the requests failed.  This means that 5% of users will see errors.  If you are running a eCommerce application or a financial application then this is a substantial number of failed transactions and you may not be willing to accept this.

To be continued…..


Summary of objects updated by User-Rename job in Documentum

January 11, 2007

Documentum provides a job written in Docbasic to allow administrators to rename users and to reassign the jobs.  Documentum uses the user-name in several places – determining the owner of an document/object.

I am yet to understand why Documentum decided to use this approach instead of relating documents/objects to the users using object-ids in some form of foreign-key relationship.  It could be performance reasons.  But due to this design, there will be some issues when it comes to user-names in Documentum.  I will go into details in another post.

I will add another post with a more detailed summary of the steps involved in the process of renaming a user.  For now I am adding a summary table of the objects impacted.

(The table is getting truncated when I copy & paste from MS Word.  While I try to fix this problem, try using the original Word document.  Sorry about this.)

Download: Object-types-impacted-by-user-rename-job.doc


Best Practices in Naming Documentum Users and Groups

January 10, 2007

Constraints:

  1. The maximum allowed length for User and group names is 32 characters.
  2. Do not begin a name with the prefix dm. Documentum reserves this prefix for its own use.
  3. The name must consist of ASCII characters.
  4. User and group names are case sensitive.
  5. It is good practice not to choose names that conflict with DQL reserved words.

Note:

  1. Avoid using SQL and DQL reserved words for names. If a name must match a DQL reserved word, enclose the name in double quotes whenever you use it in a DQL query.
  2. The name must consist of characters compatible with the server os code page of the Content Server
  3. The name of user or group must be unique among the user and group names in the repository. (This means a user cannot have the same name as a group)
  4. It is better to restrict names to letters, digits, underscores. If special characters are used, third-party applications must take special care to escape them while working with Documentum (epecially while using DQL queries)
  5. To escape special characters such as spaces, apostrophes, – enclose the string in Single quotes
  6. To escape DQL reserved keywords, enclose the string in double quotes when referenced
  7. For groups, it was noticed that System changes all characters in the name to lower-case
  8. It was observed in practice that using ‘/’ & ‘\’ in the name causes problems during creation of accounts and user home-cabinets
  9. Other special characters tested were (, ), and the system accepts these names. However, they must be escaped while using DQL queries.

Recommendation:For ensuring that other applications can work with a unified group-set, I recommend that the names should be kept as simple as possible and restricting to letters, digits & underscores. Spaces, apostrophes, brackets, slashes, single-quotes, double-quotes, $, #, @, &, *, etc should be avoided as other applications