iapi replacement

On Friday I faced with a dumb, but expected, problem: there is no my favourite CLI tool (i.e. iapi) on OS X, moreover it is a challenge to get it working on Windows – you need to install CS on Windows and after that extract CLI tools from installation. Fortunately, using groovy it is not a big deal to create your own advanced CLI:

Encryption madness. Part II

Do you remember a story about dfc.crypto.repository parameter? Below is a continuation.

Case: I want to perform upgrade from 6.7SP2 to 7.2. What should I expect? Upgrade procedure will fail because AES became a standard in 2001, but it seems that EMC got to know about that only in 2013 and decided to break upgrade procedure:

- We happy?
- Vincent! We happy?
- Yeah, we happy.

Don’t you see anything strange? I do not understand what “crypto_mode = AES128_RSA1024_SHA256” and “crypto_mode = 3DES_RSA1024_SHA256” parameters do mean, actually, I do know what the abbreviations “AES”, “RSA”, “DES” and “SHA” do mean, but I have no idea what do they mean combined together. So, let’s check documentation:

Cool, now I know even less than I knew 5 minutes ago 😦 What is my problem? I’m too smartcurious, and I do know that 3DES and AES are symmetric ciphers, RSA is a asymmetric encryption algorithm which is primarily used for key exchange, and SHA is a MAC algorithm (message authentication code). And being brought together all these abbreviations make sense only for SSL/TLS, let me demonstrate:

Andreys-MacBook-Pro:~ apanfilov$ openssl ciphers -v
DHE-RSA-AES256-SHA      SSLv3 Kx=DH       Au=RSA  Enc=AES(256)  Mac=SHA1
DHE-DSS-AES256-SHA      SSLv3 Kx=DH       Au=DSS  Enc=AES(256)  Mac=SHA1
AES256-SHA              SSLv3 Kx=RSA      Au=RSA  Enc=AES(256)  Mac=SHA1
EDH-RSA-DES-CBC3-SHA    SSLv3 Kx=DH       Au=RSA  Enc=3DES(168) Mac=SHA1
EDH-DSS-DES-CBC3-SHA    SSLv3 Kx=DH       Au=DSS  Enc=3DES(168) Mac=SHA1
DES-CBC3-SHA            SSLv3 Kx=RSA      Au=RSA  Enc=3DES(168) Mac=SHA1
DES-CBC3-MD5            SSLv2 Kx=RSA      Au=RSA  Enc=3DES(168) Mac=MD5 
DHE-RSA-AES128-SHA      SSLv3 Kx=DH       Au=RSA  Enc=AES(128)  Mac=SHA1
DHE-DSS-AES128-SHA      SSLv3 Kx=DH       Au=DSS  Enc=AES(128)  Mac=SHA1
AES128-SHA              SSLv3 Kx=RSA      Au=RSA  Enc=AES(128)  Mac=SHA1
DHE-RSA-SEED-SHA        SSLv3 Kx=DH       Au=RSA  Enc=SEED(128) Mac=SHA1
DHE-DSS-SEED-SHA        SSLv3 Kx=DH       Au=DSS  Enc=SEED(128) Mac=SHA1
SEED-SHA                SSLv3 Kx=RSA      Au=RSA  Enc=SEED(128) Mac=SHA1
RC2-CBC-MD5             SSLv2 Kx=RSA      Au=RSA  Enc=RC2(128)  Mac=MD5 
RC4-SHA                 SSLv3 Kx=RSA      Au=RSA  Enc=RC4(128)  Mac=SHA1
RC4-MD5                 SSLv3 Kx=RSA      Au=RSA  Enc=RC4(128)  Mac=MD5 
RC4-MD5                 SSLv2 Kx=RSA      Au=RSA  Enc=RC4(128)  Mac=MD5 
EDH-RSA-DES-CBC-SHA     SSLv3 Kx=DH       Au=RSA  Enc=DES(56)   Mac=SHA1
EDH-DSS-DES-CBC-SHA     SSLv3 Kx=DH       Au=DSS  Enc=DES(56)   Mac=SHA1
DES-CBC-SHA             SSLv3 Kx=RSA      Au=RSA  Enc=DES(56)   Mac=SHA1
DES-CBC-MD5             SSLv2 Kx=RSA      Au=RSA  Enc=DES(56)   Mac=MD5 
EXP-EDH-RSA-DES-CBC-SHA SSLv3 Kx=DH(512)  Au=RSA  Enc=DES(40)   Mac=SHA1 export
EXP-EDH-DSS-DES-CBC-SHA SSLv3 Kx=DH(512)  Au=DSS  Enc=DES(40)   Mac=SHA1 export
EXP-DES-CBC-SHA         SSLv3 Kx=RSA(512) Au=RSA  Enc=DES(40)   Mac=SHA1 export
EXP-RC2-CBC-MD5         SSLv3 Kx=RSA(512) Au=RSA  Enc=RC2(40)   Mac=MD5  export
EXP-RC2-CBC-MD5         SSLv2 Kx=RSA(512) Au=RSA  Enc=RC2(40)   Mac=MD5  export
EXP-RC4-MD5             SSLv3 Kx=RSA(512) Au=RSA  Enc=RC4(40)   Mac=MD5  export
EXP-RC4-MD5             SSLv2 Kx=RSA(512) Au=RSA  Enc=RC4(40)   Mac=MD5  export

Doesn’t it look familiar? So, how does the crypto_mode parameter relate to “mode based on the algorithm used to generate the AEK key”? Nohow, documentation and implementation are misleading. So, what is stored in aek.key file? Nothing but a sequence of bytes which represent either 3DES or AES secret key.

No about the problem.

I have not idea why, but in D7, when fallback settings take place, encrypttext API commands got an extremely weird behaviour:

Session id is s0
API> encrypttext,c,xxx
...
DM_ENCR_TEXT_V2=AAAACHYQW2Ab8FTGW8gul3tK6Q8M+9RKuRSGgxypEcVmqaJLt2JE0+7tuKzVQzh78QTCxS6gcWAq7sOx
API> encrypttext,c,xxx
...
DM_ENCR_TEXT=W8gul3tK6Q8M+9RKuRSGgzw1veDUWIeDnWK6wT7gFEaVCVEGGORiHu/uzJNPhsAh

moreover:

~]$ iapi -X -Sapi
Running with non-standard init level: api
API> encrypttext,a,xxx
...
DM_ENCR_TEXT_V2=AAAACHYQW2Ab8FTGW8gul3tK6Q8M+9RKuRSGg63EjkDYL8tY2+Ox0El+nLbK+UgeDk0lAKELAnUx2Rnu
API> encrypttext,a,xxx
...
DM_ENCR_TEXT=W8gul3tK6Q8M+9RKuRSGg5J4DiemgxHojpL7UY6GbwClg7osvnn1GTnEmC672QgY
API> 

What does such behaviour mean? Now, you are unable to set ldap password from DA: when setting ldap password from DA, it executes replicate_setup_methods docbase method (something like “execute do_method with method=’replicate_setup_methods’,arguments=’mkfile_encrypt_text password /u01/documentum/dba/config/DCMT_DEV/ldap_08002b8f801ccabb.cnt'”), this method executes encrypttext API command and gets wrong password.

Entropy

Today when I was reading a ECN post:

We are facing the following issue in our environment. The Content server version is 7.2 on Linux. This environment was upgraded from 6.7 about 5 months back…. bla-bla-bla something about stability and performance…

I have remembered that during last year I had participated in resolution of similar problems two or three times.
Here is a root cause: Randomness in virtual machines (don’t pay much attention to “virtual”)
Why did this happen after migration from 6.7?
The answer is simple: in D7 SSL connections are enabled by default, so, D7 consumes more entropy.
What to do?
Check Marginalia page.

Login tickets

On Thursday I was asked a naive question:

http://192.168.1.110:8080/D2/servlet/Download?uid…DM_TICKET=T0JKIE5VTEwgMAoxMwp2ZXJz…

this servlet is used by D2 for the PDF widget viewer
I was supposing that the ticket is a one time ticket, but it is not!
I don’t know why, maybe EMC is using the same ticket during all the session but I would have used a one time DM_TICKET to avoid to use it multiple time

Strictly speaking Documentum tickets has nothing in common with one-time passwords. Let me explain. The main idea of one-time passwords is not to verify your credentials but verify you as a person, for example, I have a bank account in Russian bank (actually, I also have a bank account in Australian bank, but IT in Australia is so infant that it is not possible to provide a real-world example), in order to take advantage of their internet banking I do following: I open browser, enter internet banking URL and submit my credentials, after that internet banking asks me to submit a one-time password and, in order to do so, it provides me two options to get one-time password:

  • receive one-time password by sms
  • go to ATM and get a hard-copy with a list of ten one-time passwords (if I choose this option i)

I submit one-time password and now I’m able to work with internet banking, so, the bank assumes that the person who knows credentials and able to receive one-time password by sms (or able to go to ATM and get a hard-copy with a list of one-time passwords) is me, actually, it’s a kind of tradeoff between security and convenience – bank may create more comprehensive authorization scheme, but it’s hardly possible that after that anyone will use their internet banking.

Documentum tickets should be considered just as temporary passwords which are valid during a specific period of time (see also login_ticket_timeout in dm_server_config):

API> getlogin,c,
...
DM_TICKET=T0JKIE5VTEwgMAoxMwp2ZXJzaW9uIElOVCBTIDAKMwpmbGFncyBJTlQgUyAwCjEKc2VxdWVuY2VfbnVtIElOVCBTIDAKMTI3MApjcmVhdGVfdGltZSBJTlQgUyAwCjE0NTUwNTE0MTIKZXhwaXJlX3RpbWUgSU5UIFMgMAoxNDU1MDUxNzEyCmRvbWFpbiBJTlQgUyAwCjAKdXNlcl9uYW1lIFNUUklORyBTIDAKQSA3IGRtYWRtaW4KcGFzc3dvcmQgU1RSSU5HIFMgMApBIDEwOCBETV9FTkNSX1RFWFRfVjI9QUFBQUVDbDhaMjd3dXpoK25GeHczWi81Mjl6Y3FidDV5R1FVNWRyc3dqeGhDN3d6bDZhOUFHbFNZYmFtNVc5M3pycHBWMWw2ODdoSkw0TFo5cnZHa29vM3ozWT0KZG9jYmFzZV9uYW1lIFNUUklORyBTIDAKQSA4IERDVE1fREVWCmhvc3RfbmFtZSBTVFJJTkcgUyAwCkEgMTEgZG9jdTcyZGV2MDEKc2VydmVyX25hbWUgU1RSSU5HIFMgMApBIDggRENUTV9ERVYKc2lnbmF0dXJlX2xlbiBJTlQgUyAwCjExMgpzaWduYXR1cmUgU1RSSU5HIFMgMApBIDExMiBBQUFBRUlxdDZTK0ZwMXRTbHNCK2xrbVN1cGVQWVUxSk9DT3JYckZsNEVlMHNxcEJWcnpocTd6eHRlcGRtM0JxeW5hdmdoS1cyRGRPaUFYK1dpcE9ERzdxM0oyT1VDd1p5L0xwczhxT1BXdEh3ckRHCg==
API> Bye
[dmadmin@docu72dev01 ~]$ base64 -d
T0JKIE5VTEwgMAoxMwp2ZXJzaW9uIElOVCBTIDAKMwpmbGFncyBJTlQgUyAwCjEKc2VxdWVuY2VfbnVtIElOVCBTIDAKMTI3MApjcmVhdGVfdGltZSBJTlQgUyAwCjE0NTUwNTE0MTIKZXhwaXJlX3RpbWUgSU5UIFMgMAoxNDU1MDUxNzEyCmRvbWFpbiBJTlQgUyAwCjAKdXNlcl9uYW1lIFNUUklORyBTIDAKQSA3IGRtYWRtaW4KcGFzc3dvcmQgU1RSSU5HIFMgMApBIDEwOCBETV9FTkNSX1RFWFRfVjI9QUFBQUVDbDhaMjd3dXpoK25GeHczWi81Mjl6Y3FidDV5R1FVNWRyc3dqeGhDN3d6bDZhOUFHbFNZYmFtNVc5M3pycHBWMWw2ODdoSkw0TFo5cnZHa29vM3ozWT0KZG9jYmFzZV9uYW1lIFNUUklORyBTIDAKQSA4IERDVE1fREVWCmhvc3RfbmFtZSBTVFJJTkcgUyAwCkEgMTEgZG9jdTcyZGV2MDEKc2VydmVyX25hbWUgU1RSSU5HIFMgMApBIDggRENUTV9ERVYKc2lnbmF0dXJlX2xlbiBJTlQgUyAwCjExMgpzaWduYXR1cmUgU1RSSU5HIFMgMApBIDExMiBBQUFBRUlxdDZTK0ZwMXRTbHNCK2xrbVN1cGVQWVUxSk9DT3JYckZsNEVlMHNxcEJWcnpocTd6eHRlcGRtM0JxeW5hdmdoS1cyRGRPaUFYK1dpcE9ERzdxM0oyT1VDd1p5L0xwczhxT1BXdEh3ckRHCg==
OBJ NULL 0
13
version INT S 0
3
flags INT S 0
1
sequence_num INT S 0
1270
create_time INT S 0
1455051412
expire_time INT S 0
1455051712
domain INT S 0
0
user_name STRING S 0
A 7 dmadmin
password STRING S 0
A 108 DM_ENCR_TEXT_V2=AAAAECl8Z27wuzh+nFxw3Z/529zcqbt5yGQU5drswjxhC7wzl6a9AGlSYbam5W93zrppV1l687hJL4LZ9rvGkoo3z3Y=
docbase_name STRING S 0
A 8 DCTM_DEV
host_name STRING S 0
A 11 docu72dev01
server_name STRING S 0
A 8 DCTM_DEV
signature_len INT S 0
112
signature STRING S 0
A 112 AAAAEIqt6S+Fp1tSlsB+lkmSupePYU1JOCOrXrFl4Ee0sqpBVrzhq7zxtepdm3BqynavghKW2DdOiAX+WipODG7q3J2OUCwZy/Lps8qOPWtHwrDG
[dmadmin@docu72dev01 ~]$ perl -MPOSIX -e 'print strftime "%a %b %e %H:%M:%S %Y\n", gmtime 1455051412'
Tue Feb  9 20:56:52 2016
[dmadmin@docu72dev01 ~]$ perl -MPOSIX -e 'print strftime "%a %b %e %H:%M:%S %Y\n", gmtime 1455051712'
Tue Feb  9 21:01:52 2016

and initially login tickets were used when Content Server was need to force external client to authenticate using certain credentials, some examples:

but what you are observing in D2 is just a result of smelling code.

Wrong JDK

Recently I have noticed a dumb tendency when devops/developers instead of installing Oracle JDK do something weird: they download Documentum Foundation Classes distribution archive from EMC portal and try to use JDK shipped within that distribution archive (another weird case is an attempt to deploy applications into JBoss installed on Content Server host) – never ever do that: the JRE/JDK bundled with Documentum products is broken. The problem is since D7 EMC started poisoning bundled JRE by their cryptographic libraries – I already mentioned that here, but slow startup is only a part of problem, the real problem is these cryptographic libraries are broken (check the thorough explanation on ECN: xcp wait for email on gmail working for anyone?). Typical stacktraces are:

Caused by: java.security.cert.CertificateException: Certificate contains invalid public key: Unrecognized public key.
 at com.rsa.cryptoj.o.pk.g(Unknown Source)
 at com.rsa.cryptoj.o.pk.<init>(Unknown Source)
 at com.rsa.cryptoj.o.pj.<init>(Unknown Source)
 at com.rsa.cryptoj.o.pg.a(Unknown Source)
 at com.rsa.cryptoj.o.ot.engineGenerateCertificate(Unknown Source)
 at java.security.cert.CertificateFactory.generateCertificate(CertificateFactory.java:339)
 at com.bea.common.security.jdkutils.X509CertificateFactory.engineGenerateCertificate(X509CertificateFactory.java:118)
 at java.security.cert.CertificateFactory.generateCertificate(CertificateFactory.java:339)
java.security.SignatureException: Certificate verify failed!
 at com.rsa.cryptoj.o.pj.a(Unknown Source)
 at com.rsa.cryptoj.o.pj.verify(Unknown Source)
 at com.dstc.security.util.licensing.License.getPublicKey(License.java:275)
com.microsoft.sqlserver.jdbc.SQLServerException: The driver could not establish a secure connection to SQL Server by using Secure Sockets Layer (SSL) encryption. Error: "Connection reset ClientConnectionId:21963716-d0fc-4801-9904-f7c304848444".
at com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:1668)
at com.microsoft.sqlserver.jdbc.TDSChannel.enableSSL(IOBuffer.java:1668)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:1324)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:992)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:828)
at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:1012)
at java.sql.DriverManager.getConnection(DriverManager.java:579)
at java.sql.DriverManager.getConnection(DriverManager.java:243)
Caused by: java.io.IOException: Connection reset ClientConnectionId:21963716-d0fc-4801-9904-f7c304848444
at com.microsoft.sqlserver.jdbc.TDSChannel$SSLHandshakeInputStream.readInternal(IOBuffer.java:717)
at com.microsoft.sqlserver.jdbc.TDSChannel$SSLHandshakeInputStream.read(IOBuffer.java:700)
at com.microsoft.sqlserver.jdbc.TDSChannel$ProxyInputStream.readInternal(IOBuffer.java:895)
at com.microsoft.sqlserver.jdbc.TDSChannel$ProxyInputStream.read(IOBuffer.java:883)
at com.rsa.sslj.x.aP.c(Unknown Source)
at com.rsa.sslj.x.aP.a(Unknown Source)
at com.rsa.sslj.x.aP.a(Unknown Source)
at com.rsa.sslj.x.aP.h(Unknown Source)
at com.rsa.sslj.x.cy.startHandshake(Unknown Source)
at com.microsoft.sqlserver.jdbc.TDSChannel.enableSSL(IOBuffer.java:1618)

A FATAL error has occurred

Have you ever seen such error:

?

I believe everybody who tried to deploy Documentum in large enterprise had faced with such spontaneous errors but never paid much attention because error message is completely misleading: “[DM_SESSION_E_AUTH_FAIL]error: “Authentication failed for user” – dumb user enters invalid password, so it’s not our issue. But actually, this error reveals a lot of problem related to session management in Documentum.

Root cause

If take a careful look at stacktrace it becomes clear that the error originates not from login page but somewhere from WDK (put any other application here) intestines:

at com.documentum.web.formext.privilege.PrivilegeService.getUserPrivilege(PrivilegeService.java:57)
at com.documentum.web.formext.config.PrivilegeQualifier.getScopeValue(PrivilegeQualifier.java:69)
at com.documentum.web.formext.config.BoundedContextCache.retrieveQualifierScopeValue(BoundedContextCache.java:153)
at com.documentum.web.formext.config.ScopeKey.<init>(ScopeKey.java:57)
at com.documentum.web.formext.config.ConfigService.makeScopeKey(ConfigService.java:1482)
at com.documentum.web.formext.config.ConfigService.lookupElement(ConfigService.java:527)

this fact means that user was already successfully logged in and this is, obviously, not a user’s mistake. So, what does really happen there? The basic description of the problem is: user’s dfc session got either reused by another user or released by application and when application tries to acquire new dfc session it fails. So, the first question is why application fails to acquire new dfc session. I believe there are a lot of reasons, but the most common for large enterprises is following: ldap authentication in Documentum is unstable: most time it works as expected but sometimes it fails and causes a hardly diagnosable issues.

Mitigation options

  1. disable D7 session pooling in dfc (i.e. set dfc.compatibility.useD7SessionPooling to false) – the most of customers noticed that error has started occurring more frequently after moving to new version of dfc, actually, it is a true, because new pooling implementation tends to keep the amount of dfc sessions as small as possible, so the amount of authentication requests increases
  2. if you use bind_type=bind_search_dn in ldap config switch to bind_by_dn – it will decrease the amount of ldap round-trips
  3. use as nearest ldap servers as possible
  4. put a blame onto EMC – authentication is not a kind of thing which must occur every 5 seconds due to poor application design

Some ideas about organising storage for content files

Memento mori

When planning how are you going to store content files always think about disaster recovery, the typical case is: storage admins ask you how many disk space do you need and after that they provision one large 10-20Tb LUN for Documentum – this is completely wrong, because in case of disaster recovery your primary goal is to decrease RTO and RPO, but restoring “obsolete” files in 10-20Tb LUNs won’t help you – business users always have a preferences about what needs to be recovered first, it may be content of specific/business-critical types or content loaded within last two days/weeks/months, also keep in mind that Documentum does not work without content of /System cabinet.

General considerations are:

  1. always prefer NAS to SAN – in general, NAS appliances are slower than SAN, but it is not an issue for Documentum, furthermore, most NAS appliances have a build-in capabilities which do not exists in SAN appliances, for example: if you need to scale your repository on multiple servers you have two options: create a cluster filesystem (cluster software costs extra money and requires extra maintenance) or use NAS, typically NAS appliances represent a symbiosis between filesystem, network and disk drivers, so, the most of NAS appliances have a build-in replication and snapshot capabilities (SAN appliances may have such capabilities too, but the problem is SAN appliances have no idea about what is stored in underlying LUN)
  2. if you have no choice and SAN is the only option always use volume manager – never ever create a filesystem on a LUN without volume manager, otherwise in future you will unable to perform an extremely simple operations without downtime, for example, if I need to move all data from one storage to another (somebody decided to decommission and old appliance or I decided to move old data on slow storage) I just add new physical volume to the existing disk group, remove old physical volume and wait some time while volume manager moves data between physical volumes in online
  3. split content volumes into maintainable pieces – it may be a 3-6 months’ worth of data or 1-2Tb volumes, in my deployments I have found that 2Tb is an optimal size
  4. try to understand business value of stored content and design storage accordingly, Content Storage Services option is your friend here

Trusted Content Services

Never ever use Trusted Content Services option for encrypting content files, the considerations are:

  • it does not bring any value from security perspective, even stubborn EMC employees realised that
  • there are different opinions about how to properly use AV-software in Documentum environment, some guy think that real-time scan is good and get something like: , another guys think that periodic AV-scans of content volumes is ok, but what are you going to find if all content is encrypted? Moreover, viruses have a dumb nature: today infected file may be treated as harmless, tomorrow it will be harmful, so, encryption is not AV friend.
  • it seems that EMC fails to provide backward compatibility for TCS option across releases and operating systems: How will content be re-encrypted during TCS 7.2 upgrade?, Documentum Migration from AIX to Linux