Thursday, May 30, 2013

When tiny changes come back to bite

A Java network application that connected to a storage array's control port via an SSL socket mysteriously quit working when moved from Java 6 to Java 7. Not *all* the time, but just in certain configurations. The application was written based on Oracle's own example code which hasn't changed since the mid 'oughts, so everybody was baffled. My assignment: Figure out what was going on.

The first thing I did was create a test framework with Java 6, Java 7, and a Python re-implementation and validate that yes, the problem was something that Java 7 was doing. Java 6 worked. Python via the cPython engine worked. Java 7 didn't work, it logged in (the first transaction) then the next transaction (actually sending a command) failed. Python via the jYthon engine on top of Java 7 didn't work. I tried on both Oracle's official Java 7 JDK and Red Hat's OpenJDK that came with Fedora 18, neither worked. So it clearly had something to do with differences in the Java 7 SSL stack.

I then went to Oracle's site to look at the change notices on difference between Java 6 and Java 7. There was nothing that looked interesting. Setting sun.security.ssl.allowUnsafeRenegotiation=true and sse.enableSNIExtension=false were two things that looked like they might account for some difference. Maybe the old storage array was using an old protocol. So I ran the program under Java 6 with -Djavax.net.debug=all, found out what protocol it was using, then hardwired my SSL ciphers list to use that protocol as default. I tested again under Java 7, and it still didn't work.

Then I ran the program on the two different environments with -Djavax.net.debug=all on my main method on both Java 6 and Java 7. The Java 6 was the OpenJDK from the Centos 6.4 distribution. The Java 7 was the OPenJDK on Fedora 18. Two screens, one on each computer, later, and the output on both of them was identical -- up until the transmission of the second SSL packet. The right screen (the Fedora 18 screen) split it into *two* SSL packets.

But why? I downloaded the source code to OpenJDK 6 and OpenJDK 7 and extracted them, and set out to figure out what was going on here. The end result: A tiny bit of code in SSLSocketImpl.java called needToSplitPayload() was called from the SSL engine, and if that code says yes, it splits the packet. So... is the cipher a CBC mode? Yes. is it the first app output record? No. Is Record.enableCBCProtection set? Wait, where is that set?! I head over to Record.java, and find that it's set from the property "jsse.enableCBCProtection" at JVM startup.

The end result: I call System.setProperty("jsse.enableCBCProtection", "false"); in my initializer (or set it on the command line) and everything works.

So what's the point of CBC protection? Well, the point is to deal with traffic analysis attacks against web servers. With snooping on the traffic you can figure out what transactions are happening and figure out important details of the session. The problem is that in my application, the receiving storage array apparently wants packets to stay, well, packets, not multiple packets. It's not a problem for the code I wrote, I look for begin and end markers and don't care how many packets it takes to collect a full set of data to shove into my XML parser, but I have no control over the legacy storage array end of things. It apparently does *not* look for begin and end markers, it just grabs a packet and expects it to be the whole tamale.

So all in all, CBC protection is a Good Thing -- except in my application, where a legacy storage array cannot handle it. Or in a wide variety of other applications where it slows down the application to the point of uselessness by breaking up nice big packets into teeny little packets that clog the network. But at least Oracle gave us a way to disable it for these applications. The only thing I'll whine about is this: Oracle apparently implemented this functionality *without* adding it to their list of changes. So I had to actually reverse-engineer the Java runtime to figure out that one obscure system property is apparently set to "false" by default in OpenJDK 6 and set to "true" by default in OpenJDK 7. Which is not The Way It Spozed To Be, but at least thanks to the power of Open Source it was not a fatal problem -- just an issue of reading through source code of the runtime until locating the part that broke up SSL packets, then tracing backwards through the decision tree up to the root. Chalk up one more reason to use Open Source -- if something mysterious is happening, you can pretty much always figure it out. It might take a while, but that's why we make the big bucks, right?

-ELG

Wednesday, May 1, 2013

ORM

I'm going to speak heresy here: Object-Relational Mappers such as Hibernate are evil. I say that as someone who wrote an object-relational mapper back in 2000 -- the BRU Server server is written as a master class that maps objects to a record in a MySQL database, which is then inherited by child classes that implement the specific record types in the MySQL database. The master class takes care of adding the table to the database if it doesn't exist, as well as populating the Python objects on queries. The child classes take care of business logic and generating queries that don't map well to the ORM, but pass the query results to generators to produce proper object sets out of SQL data sets.

So why did this approach work so well for BRU Server, allowing us to concentrate on the business logic rather than the database logic and allowing its current owners to maintain the software for ten years now, while it fails so harshly for atrocities like Hibernate? One word: complexity. Hibernate attempts to handle all possible cases, and thus ends up producing terrible SQL queries while making things that should be easy difficult, but that's because it's a general purpose mapper. The BRU Server team -- all four of us -- understood that if we were going to create a complete Unix network backup solution within the six months allotted to us, complexity was the enemy. We understood the compromises needed between the object model and the relational model, and the fact that Python was capable of expressing sets of objects as easily as it was capable of expressing individual objects meant that the "object=record" paradigm was fairly easy to handle. We wrote as much ORM as we needed -- and no more. In some cases we had to go to raw relational database programming, but because our ORM was so simple we had no problems with that. There were no exceptions being thrown because of an ORM caching data that no longer existed in the database, and the actual objects for things like users and servers could do their thing without worrying about how to actually read and write the database.

In the meantime, I have not run into any Spring/Hibernate project that actually managed to produce usable code performing acceptably well in any reasonable time frame with a team of a reasonable size. I was at one company that decided to use the PHP-based code that three of us had written in four weeks' time as the prototype for the "real" software, which of course was going to be Java and Spring and Hibernate and Restful and all the right buzzwords, because real software isn't written in PHP of course (though our code solved the problem and didn't require advanced degrees to understand). Six months later and a cast of almost a dozen and no closer to release than at the beginning of the project, the entire project was canned and the project team fired (not me, I was on another project, but I had a friend on that project and she was not a happy camper). I don't know how much money was wasted on that project, but undoubtedly it hurried the demise of that company.

But maybe I'm just not well informed. It wouldn't be the first time, after all. So can anybody point me to a Spring/Hibernate project that is, say, around 80,000 lines of code written in under five months' time by a team of four people, that not only does database access but also does some hard-core hardware-level work slinging massive amounts of data around in a three-box client-server-agent architecture with multiple user interfaces (CLI and GUI/Web minimum)? That can handle writing hundreds of thousands of records per hour then doing complex queries on those records with MySQL without falling over? We did this with BRU Server, thanks to Python and choosing just enough ORM for what we needed (not to mention re-using around 120,000 lines of "C" code for the actual backup engine components), and no more (and no less). The ORM took me a whole five (5) days to write. Five. Days. That's it. Granted, half of that is because of Python and the introspection it allows as part of the very definition of the language. But. Five days. That's how much you save by using Spring/Hibernate over using a language such as Ruby or Python that has proper introspection and doing your own object-relational mapping. Five days. And I submit that the costs of Spring/Hibernate are far, far worse, especially for the 20% of projects that don't map well onto the Spring/Hibernate model, such as virtually everything that I do (since I'm all about system level operations).

-ELG