Thursday, May 30, 2013

When tiny changes come back to bite

A Java network application that connected to a storage array's control port via an SSL socket mysteriously quit working when moved from Java 6 to Java 7. Not *all* the time, but just in certain configurations. The application was written based on Oracle's own example code which hasn't changed since the mid 'oughts, so everybody was baffled. My assignment: Figure out what was going on.

The first thing I did was create a test framework with Java 6, Java 7, and a Python re-implementation and validate that yes, the problem was something that Java 7 was doing. Java 6 worked. Python via the cPython engine worked. Java 7 didn't work, it logged in (the first transaction) then the next transaction (actually sending a command) failed. Python via the jYthon engine on top of Java 7 didn't work. I tried on both Oracle's official Java 7 JDK and Red Hat's OpenJDK that came with Fedora 18, neither worked. So it clearly had something to do with differences in the Java 7 SSL stack.

I then went to Oracle's site to look at the change notices on difference between Java 6 and Java 7. There was nothing that looked interesting. Setting and sse.enableSNIExtension=false were two things that looked like they might account for some difference. Maybe the old storage array was using an old protocol. So I ran the program under Java 6 with, found out what protocol it was using, then hardwired my SSL ciphers list to use that protocol as default. I tested again under Java 7, and it still didn't work.

Then I ran the program on the two different environments with on my main method on both Java 6 and Java 7. The Java 6 was the OpenJDK from the Centos 6.4 distribution. The Java 7 was the OPenJDK on Fedora 18. Two screens, one on each computer, later, and the output on both of them was identical -- up until the transmission of the second SSL packet. The right screen (the Fedora 18 screen) split it into *two* SSL packets.

But why? I downloaded the source code to OpenJDK 6 and OpenJDK 7 and extracted them, and set out to figure out what was going on here. The end result: A tiny bit of code in called needToSplitPayload() was called from the SSL engine, and if that code says yes, it splits the packet. So... is the cipher a CBC mode? Yes. is it the first app output record? No. Is Record.enableCBCProtection set? Wait, where is that set?! I head over to, and find that it's set from the property "jsse.enableCBCProtection" at JVM startup.

The end result: I call System.setProperty("jsse.enableCBCProtection", "false"); in my initializer (or set it on the command line) and everything works.

So what's the point of CBC protection? Well, the point is to deal with traffic analysis attacks against web servers. With snooping on the traffic you can figure out what transactions are happening and figure out important details of the session. The problem is that in my application, the receiving storage array apparently wants packets to stay, well, packets, not multiple packets. It's not a problem for the code I wrote, I look for begin and end markers and don't care how many packets it takes to collect a full set of data to shove into my XML parser, but I have no control over the legacy storage array end of things. It apparently does *not* look for begin and end markers, it just grabs a packet and expects it to be the whole tamale.

So all in all, CBC protection is a Good Thing -- except in my application, where a legacy storage array cannot handle it. Or in a wide variety of other applications where it slows down the application to the point of uselessness by breaking up nice big packets into teeny little packets that clog the network. But at least Oracle gave us a way to disable it for these applications. The only thing I'll whine about is this: Oracle apparently implemented this functionality *without* adding it to their list of changes. So I had to actually reverse-engineer the Java runtime to figure out that one obscure system property is apparently set to "false" by default in OpenJDK 6 and set to "true" by default in OpenJDK 7. Which is not The Way It Spozed To Be, but at least thanks to the power of Open Source it was not a fatal problem -- just an issue of reading through source code of the runtime until locating the part that broke up SSL packets, then tracing backwards through the decision tree up to the root. Chalk up one more reason to use Open Source -- if something mysterious is happening, you can pretty much always figure it out. It might take a while, but that's why we make the big bucks, right?


1 comment:

  1. Hi Eric, thanks a lot for this post.
    I had exactly the same issue and was going to start digging for answer in the JDK sources, but your post saved me a lot of time.