Never happened before and happened again

As I described previosely, in 2012 we were implementing a large Documentum project, and that time we were observing a lot of performance and stability issues (actually, it was bit weird to observe such issues in mature product maintained by talented team). And one of issues was following: it was not possible to restart JMS during business hours (from my perspective it is not a big deal to do that if your application is properly designed – business users do not consume JMS, so, bouncing JMS should not affect business users):

  • about 5 of 10 restarts ended with deadlocks in java
  • about 4 of 10 restarts ended with crash of java process with SIGBUS error in libzip.so, something like:
    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  SIGBUS (0x7) at pc=0x00007f881a262982, pid=19993, tid=140222193350400
    #
    # JRE version: Java(TM) SE Runtime Environment (7.0_72-b14) (build 1.7.0_72-b14)
    # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.72-b04 mixed mode linux-amd64 compressed oops)
    # Problematic frame:
    # C  [libzip.so+0x4982]  newEntry+0x62
    #
    
  • and only 1 of 10 restarts was successful

So, we filed a couple SRs and after a while EMC support started claiming that first two issues were addressed in recent Documentum patches. Frankly speaking, during last 4 years I haven’t seen JMS crashes with SIGBUS error in libzip.so, but now I’m not sure that talented team had really fixed the issue, because two days ago our DevOps engineer claimed that Documentum 7.2 is still affected and he was a witness of three succeeded crashes of JMS with SIGBUS error in libzip.so – I just have lost faith in talented team 😦

Well, let discuss this odd error and try to fix it 🙂 If you google for java + sigbus + libzip.so you will find a plenty of bugs in openjdk JIRA and all of them states something like:

  • JVM maps central directory of zip file into memory using mmap system call due to performance considerations
  • it is expected that application receives SIGBUS error is mapped file gets overwritten:
           SIGBUS Attempted access to a portion of the buffer that does not
                  correspond to the file (for example, beyond the end of the
                  file, including the case where another process has truncated
                  the file).
    
  • there is a -Dsun.zip.disableMemoryMapping=true system property which was designed to “prevent crashes”

So, JDK seems to “work as designed”, nobody is going to fix this behaviour (not true for JDK9, but for current JDK versions we must take this behaivour as given) and the real root cause of our problem is “somebody” changes jar files on JMS while they are in use and, hence, causes it crashes, and this “somebody” is an odd TBO implementation. Is it possible to fix it? Actually, I have no idea what talented team was doing four years ago, but I have found robust solution within four hours (if my memory serves me right they was trying to fix the issue during four months) – the good starting point here is to find/write a good testcase and I have found it in JDK-8017777 SIGBUS in java.util.zip.ZipFile.getEntry:

$ java ZipFileTest
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0xa) at pc=0x00000001054cfe92, pid=5775, tid=0x0000000000004d03
#
# JRE version: Java(TM) SE Runtime Environment (8.0_111-b14) (build 1.8.0_111-b14)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.111-b14 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# C  [libzip.dylib+0x2e92]  newEntry+0x154
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /Users/apanfilov/hs_err_pid5775.log
[thread 21251 also had an error]
[thread 22275 also had an error]
[thread 23299 also had an error]
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
[thread 24323 also had an error]
Abort trap: 6

Cool! Not lets check the behaviour of JVM with -Dsun.zip.disableMemoryMapping=true system property:

$ java -Dsun.zip.disableMemoryMapping=true ZipFileTest
Exception in thread "Thread-2" java.util.zip.ZipError: jzentry == 0,
 jzfile = 140315557132880,
 total = 1001,
 name = /var/folders/9d/m0q0kvvx3fj54mr_m3qs3twh0000gn/T/zipfile.zip ,
 i = 1,
 message = null
	at java.util.zip.ZipFile$ZipEntryIterator.next(ZipFile.java:519)
	at java.util.zip.ZipFile$ZipEntryIterator.nextElement(ZipFile.java:499)
	at java.util.zip.ZipFile$ZipEntryIterator.nextElement(ZipFile.java:480)
	at ZipFileTest$ZipFileReader.run(ZipFileTest.java:65)
	at java.lang.Thread.run(Thread.java:745)
Exception in thread "Thread-0" java.util.zip.ZipError: jzentry == 0,
 jzfile = 140315557132880,
 total = 1001,
 name = /var/folders/9d/m0q0kvvx3fj54mr_m3qs3twh0000gn/T/zipfile.zip ,
 i = 1,
 message = null
	at java.util.zip.ZipFile$ZipEntryIterator.next(ZipFile.java:519)
	at java.util.zip.ZipFile$ZipEntryIterator.nextElement(ZipFile.java:499)
	at java.util.zip.ZipFile$ZipEntryIterator.nextElement(ZipFile.java:480)
	at ZipFileTest$ZipFileReader.run(ZipFileTest.java:65)
	at java.lang.Thread.run(Thread.java:745)

Well, JVM didn’t crash, but we got a plenty of errors, so, -Dsun.zip.disableMemoryMapping=true system property prevents crashes only and seems to be unusable 😦 Here we need to stop blaming JDK and start thinking about following: what does mean “mapped file gets overwritten”, i.e. are the following scenarios are doing the same or not:

$ file 1.txt
1.txt: empty
$ echo "talented team" > 1.txt
$ file 1.txt
1.txt: ASCII text
$ echo "talented team" > 1.txt.tmp
$ mv 1.txt.tmp 1.txt

Actually, the second scenario is safe from JVM perspective and small patch for initial testcase may prove this:

@@ -79,9 +78,12 @@ public ZipFileWriter(File aFile) {
         public void run() {
             int i = 1;
             ZipOutputStream zOut;
+            File tmp;
             try {
-                zOut = new ZipOutputStream(new FileOutputStream(file));
-            } catch (FileNotFoundException e1) {
+                tmp = File.createTempFile(ZipFileTest.class.getSimpleName(),
+                        null, file.getParentFile());
+                zOut = new ZipOutputStream(new FileOutputStream(tmp));
+            } catch (IOException e1) {
                 e1.printStackTrace();
                 return;
             }
@@ -102,6 +104,9 @@ public void run() {
             } finally {
                 try {
                     zOut.close();
+                    if(!tmp.renameTo(file)) {
+                        System.err.println("Can't rename");
+                    }
                 } catch (IOException e) {
                 }
             }

and here we are:

$ java ZipFileTest
$ 

What about DFC? Javassist rocks!