Confessions of a Linux Penguin: software development

Showing posts with label software development. Show all posts

Friday, January 30, 2015

User friendly

Here's a tip:

If your Open Source project requires significant work to install, work that would be easily scriptable by a competent software engineer, I'm not going to use it. The reason I'm not going to use it is because I'm going to assume you're either an incompetent software engineer, or the project is in unfinished beta state not useful for a production environment. Because a competent software engineer would have written those scripts if the product was in a finished production-ready state. Meaning you're either incompetent, or your product is a toy of use for personal entertainment only.

Is this a fair assessment on my part? Probably not. But this is what 20 years of experience has taught me. Ignore that at your peril.

-ELG

Monday, November 5, 2012

The importance of strong product management

There's a lot of folks who whine about Windows 8, "Why did Microsoft have to change the UI? I like the old one!" The thing is, the old one simply isn't working well for a lot of people anymore. Hard drives have gotten so big, and people have installed so many programs on their systems, that the Start menu has achieved a depth that nuclear submarines would envy. Because the population is aging and eye-hand coordination is declining, both seeing all the tiny print on that Start menu and navigating it through several levels of sub-menus has become increasingly hard for a large percentage of the population. And finally, the Start menu paradigm simply doesn't work for tablets. If eye-hand coordination diving through the menu is an issue with a mouse, with a tablet touchscreen it would simply be impossible.

In other words, the notion that the Windows 8 UI change is all about "marketing" is pretty much nonsense. It's been well known that the Cairo user interface introduced with Windows 95 has reached its logical limits for quite some time, and ideas for changing the UI to meet the new challenges of the 21st Century have been floating around inside Microsoft for years, if a look at the Microsoft Research web site is any guide. I'm sure that Marketing told engineering, "we need a UI that will be usable on a tablet! And oh, make it usable on a desktop too!", but at worst Marketing merely hurried what was already in progress, rather than being a direct cause of the changes in Windows 8. The writing was on the wall for the Cairo UI, and sooner or later it would have been consigned to the dustbin of history regardless of Marketing's frantic panic about tablets.

So unlike a lot of people, I'm not surprised at all that Windows 8 has a significant shift in UI functionality. What I *am* surprised at is that it was done so badly. Microsoft has a lot of good people, and Windows 8 has all the raw tools in it to be a great operating system. Yet there's some needless complexities in its operation that shouldn't be there, and some important missing functionality that should be, such as IOS or Android style icon folders (without that, you're in endless sideways-scrolling territory to get all your most-used programs onto the start screen). So what gives?

In my opinion, the biggest issue with Windows 8 is caused by a clear failure of product management. Good product managers are hard to find because the job requires an understanding of customers at an intuitive level such that you can devise workable requirements to meet their needs, yet sufficient technical chops to understand what is doable and guide engineering toward producing the product that is going to meet those requirements. It also requires taste -- the ability to look at a product and say, "yes, that is tasteful and will please our customers" or look at a product and say "that is a pile of garbage, get it out of my sight until you do X, Y, and Z to it." Furthermore, product managers have to be empowered to be able to make those sort of judgements and have them stick. For better or for worse, Steve Jobs provides the template for what a strong product manager looks like -- opinionated, tasteful, with an intuitive understanding of the customer, with enough technical chops to understand what can be done, and power to make it stick.

Thing is, it's hard to find product managers like that because the geeks and nerds who typically run engineering departments wouldn't know good taste if it bit them on their bum, while the sales flunkees who typically run marketing departments wouldn't know technical chops if said chops bit off their ear. You almost need a Steve Jobs to do it. Unfortunately Microsoft doesn't appear to have a Steve Jobs to find good product managers, or if they do have good product managers, haven't empowered said product managers to make critical decisions about the product. Which is a shame. Because Windows 8 has a lot of good ideas, and the underlying technology is good. It just fails because of a lack of good taste (and courage, but see my prior blog on that), not because of a lack of technical chops.

Which just goes to show that putting out a great product isn't a matter of having great technology. It has to be a team effort, and if you don't have that, what you'll get is either a product that doesn't meet the needs of the marketplace, or a product that's far less great than it should be. Something to think about, if you're thinking about forming or joining a new startup. Do you have the kind of team that it will take? Does the company you are thinking of joining have such a team? Important questions, yet pretty much every startup I've encountered is all about the technology, and the rest of what it takes to have a great product is completely ignored. Which is probably why so many startups fail. So it goes.

-ELG

Monday, August 9, 2010

Action items

I had joined the company a few weeks earlier and was sitting in yet another raucous meeting. The latest attempt at a new product had failed, and the blame-casting and finger-pointing were at full tilt. Finally I sighed, and added my own say. "Look. I'm new here and I don't know what all has gone on, and really don't care who's to blame for what, blame isn't going to get anything done. What I want to know is, what do we need to do now?"

Person 1: "Well, we failed because we weren't using software engineering system X" (where X is some software engineering scheme that was popular at the time).

"Okay, so we'll use software engineering system X, I have no objection to using any particular system, as long as we use one. What's the first thing we need to do, in that system?"

Person 2: "We need to figure out what we want the product to do."

"Okay, let's do that. What is the product supposed to do?"

We discussed it for a while, then from there the meeting devolved into a list of action items, and eventually broke up with another meeting scheduled to work on the detailed functional requirements. But on the whiteboard before we left, I had already sketched out the basics of "what we want it to do", and eventually that turned into an architecture and then a product that is still being sold today, many years later.

So what's my point? Simple: Meetings must be constructive. One of the things my teacher supervisors told me, when I first entered the classroom, was to always ask myself, what do I want the students to be doing? And then communicate it. A classroom where every student knows what he's supposed to be doing at any given time is a happy classroom. Idle hands being the devil's workshop and all that. The same applies to meetings. Unless it's intended to be an informational meeting, meetings should always be about, "what do we want to do". And meetings should never be about blame-casting, finger-pointing, or any of the other negative things that waste time at meetings. No product ever got shipped because people pointed fingers at each other.

Everybody should have a takeaway from a development meeting -- "this is what I am supposed to be doing." Otherwise you're simply wasting time. So now you know why one of my favorite questions, when a meeting has gone on and on and on and is now drawing to a close but without any firm conclusion, is "what do we need to be doing? What are our action items?" We all need to know that we're on the same page and that we all know what we're supposed to be doing. That way there are no surprises, there are no excuses like "but I thought Doug was supposed to do that task!" when the meeting minutes show quite well that Doug was *not* assigned that action item, and things simply get done. Which is the point, after all: Get the product done, and out the door.

--ELG

* Usual disclaimer: The above is at least slightly fictionalized to protect the innocent. If you were there, you know what really happened. If you weren't... well, you got my takeaway, anyhow.

Sunday, August 8, 2010

Architectural decisions

Let's look at two products. The first product is a small 1U rackmount firewall device with a low-power Celeron processor and 256 megabytes of memory. It can be optionally clustered into a high availability cluster so that if one module fails, the other module takes over. Hard drive capacity is provided by a 120gb hard drive or a 64GB SSD. The second is a large NAS file server with a minimum configuration of 4 gigabytes of memory and with a minimum hard drive configuration of 3.8 terabytes. The file system on this file server is inherently capable of propagating transactions due to its underlying design.

So: How are we going to handle failover on these two devices? That's where your architectural decisions come into play, and your architectural decisions are going to in large part influence how things are going to be done.

The first thing to influence our decisions is going to be how much memory and CPU we have to play with. This directly influences our language choices, because the smaller and more limited the device, the lower level we have to go in order to a) fit the software into the device, and b) get acceptable performance. So for the firewall, we chose "C". The architect of the NAS system also chose "C". As an exercise for the reader, why do you think I believe the architect of the NAS system was wrong here? In order to get acceptable performance with the small module, we chose a multi-threaded architecture where monitor threads were associated with XML entries of what to monitor, and faults and alerts were passed through a central event queue handler which used that same XML policy database to determine which handler module (mechanism) to execute for a given fault or alert event, nothing was hard-wired, everything could be reconfigured simply by changing the XML. The architect of the NAS system had an external process sending faults and alerts to the main system manager process via a socket interface using a proprietary interface, and the main system manager process then spawned off agent threads to perform whatever tasks were necessary -- but the main system manager process had no XML database or any other configurable way to associate mechanism with policy. Rather, policy for handling faults and alerts was hard-wired. Is hard-wiring policy into software wise or necessary if there is an alternative?

The next question is, what problem are we going to solve? For the firewall system, it's simple -- we monitor various aspects of the system, and execute the appropriate mechanism specified by XML-configured policies when various events happen with the goal of maintaining service as much as possible. One possible mechanism could be to ask the slave module to take over. Tweaking policy so that this only happens when there's no possibility of recovery on the active module is decidedly a goal because there is a brief blink of service outage as the upstream and downstream switches get GARP'ed to redirect gateway traffic to a different network port, and service outages are bad. We don't have to worry about resyncing when we come back up -- we just resync from the other system at that point, if we had any unsynced firewall rules or configuration items that weren't on the other system at the point we went down, well, so what. It's no big deal to manually re-enter those rules again. And in the unlikely event that we manage to dual-head (not very likely because we have a hardwired interconnect and differential backoffs where the current master wins and does a remote power-down of the slave before the slave can do a remote power-down of the master), no data gets lost because we're a firewall. We're just passing data, we're not serving it ourselves. All that happens if we dual-head is that service is going to be problematic (to say the least!) until one of the modules gets shut down manually.

For the NAS system, it's quite a bit harder. Data integrity is a must. Dual-heading -- both systems believing they are the master -- requires either advanced transaction merge semantics when partitioning is resolved (transaction merge semantics which are wicked hard to prove do not lead to data corruption), or must be avoided at all costs by having all systems associated with a filesystem immediately cease providing services if they've not received an "I'm going down" from the missing peer(s), have no ability to force the missing peer to shut down (via IPMI or other controllable power), and no way of assuring (via voting, or other mechanisms) that the missing peers are going down. Still, we're talking about the same basic principle, with one caveat -- dual-heading is a disaster and it is better to serve nothing at all than risk dual-heading.

For the NAS system, the architectural team chose not to incorporate programmable power (such as IPMI) to allow differential backoffs to assure that dual-heading couldn't happen. Rather, they chose to require a caucus device. If you could not reach the caucus device, you failed. If you reached the caucus device but there were no update ticks on the caucus device from your peer(s), you provided services. This approach is workable, but a) requires another device, and b) provides a single point of failure. If you provide *multiple* caucus devices, then you still have the potential for a single point of failure in the event of a network partition. That is because when partition happens (i.e. you start missing ticks from your peers), if you cannot reach *all* caucus devices, you cannot guarantee that the missing peers are not themselves updating the missing caucus device and thinking *you* are the down system. How did the NAS system architectural team handle that problem? Well, they didn't. They just had a single caucus device, and if anybody couldn't talk to the caucus device, they simply quit serving data in order to prevent dual-heading, and lived with the single point of failure. I have a solution that would allow multiple caucus devices while guaranteeing no dual-heading, based on voting (possibly weighted in case of a tie), but I'll leave that as an exercise to the reader.

So... architectural decisions: 1) Remember your goals. 2) Make things flexible. 3) Use as high-level an architecture as possible on your given hardware to ensure that #2 isn't a fib, i.e., if what you're doing is doable in a higher-level language like Java or Python, for heaven's sake don't do it in "C"!. 4) Separate policy from mechanism -- okay, so this is same as #2, but worth repeating. 5) Document, document, document! I don't care whether it's UML, or freehand sketches, or whatever, but your use cases and data flows through the system *must* be clear to everybody in your team at the time you do the actual design or else you'll get garbage, 6) Have good taste.

Have good taste? What does that mean?! Well, I can't explain. It's like art. I know it when I see it. And that, unfortunately, is the rarest thing of all. I recently looked at some code that I had written when I was in college, that implemented one of the early forum boards. I was surprised and a bit astonished that even this many years later, the code was clearly well structured and showed a clean and well-conceived architecture to it. It wasn't because I had a lot of skill and experience, because, look, I was a college kid. I guess I just had good taste, a clear idea of what a well-conceived system is supposed to look like, and I don't know how that can be taught.

At which point I'm rambling, so I'm going to head off to read a book. BTW, note that the above NAS and firewall systems are, to a certain extent, hypothetical. Some details match systems I've actually worked on, some do not. If you worked with me at one of those companies, you know which is which. If you didn't, well, try not to read exact details as gospel of how a certain system works, because you'll be wrong :).

-ELG

Monday, April 5, 2010

SaaS and the Dot-com Set

One of the hilarious things that has come about with cloud computing and the advent of large scale SaaS is that I'm seeing the same kinds of arguments I saw back in the dot-com days, that this is a fundamentally different business model that doesn't obey the same rules as traditional software development. Which, of course, is utter nonsense. The method of delivery to customers has changed, but customers remain customers.

My response to all this:

1. Our primary requirement is to meet the needs of the customer. Some customers have legal requirements which preclude SaaS in the sense of SaaS in the cloud, but still wish to have the advantages of SaaS. Think doctors, schools, etc. -- it is actually illegal for them to host patient / student data outside of their own facilities on shared servers. But if we can give them the benefits of SaaS inside their facility using the same software that we have deployed into the cloud -- i.e., *not* a separate version of the software -- then we've fulfilled their needs without any (zero) additional development overhead. And BTW, just to counter one of Marko's points, anybody who structures their sales commissions structure to reward selling private rather than SaaS in the cloud is an idiot and deserves to fail, cloud is generally *much* less support on our part, it just doesn't meet the needs of certain customers.

2. I have not encountered any customers who want rapid updates of critical applications, ever. My boss once ordered me to deploy a new update to a school scheduling program in the middle of a school year. I pointed out that a) school secretaries and counsellors were currently doing mid-term schedule changes, b) school secretaries and counsellors had not been trained on the changes, which were significant (I had re-written the entire scheduling system from scratch going back to first principles, because the old system was incapable of handling some of the new scheduling paradigms that had come out, such as multi-shift scheduling and quarter-system scheduling), and thus c) it would be a fiasco. He said the old system was broken, so deploy the new one anyhow. I did. And got to say "I told you so" to my boss when it turned into the fiasco I'd predicted. My point: Users are fond of *saying* they want the latest, greatest features, but what they actually want is to get their job done. Paying attention to what users say, as vs. what they actually want, can be a huge mistake costing you a lot of money in additional support costs and losing a lot of customer goodwill. Not only were our support lines clogged solid for a week, my boss had to eat a lot of crow at the next user group meeting to get some of that lost goodwill back.

3. I am on Twitter. Yes, customers will Tweet stuff. 140 characters doesn't exactly get you in-depth commentary though. If you let Twitter guide your product development, what you get is a product designed by tweets, which is indistinguishable from a product designed by twits. I have only met one customer in my entire life who actually knew what he wanted (a school discipline coordinator who said, "I want a computerized version of these three state-mandated forms, and reports that fulfill the requirements of these four legally-required forms that I must submit at the end of the school year"). The rest have some vague idea, but you must get with them and engage them in a lengthy discussion complete with design proposals that include sample screen displays of what the application might look like. For one clustered storage product I actually spent more time talking to potential customers, writing proposals, and getting feedback than I spent implementing the actual product. Needless to say, that is *not* a process that occurs in 140 characters.

4. Anybody who goes into a business where there is an entrenched incumbent expecting to compete on features is an idiot in the first place. The incumbent has basically infinite resources at his disposal compared to you and is capable of implementing far more features than any newcomer. He will simply steal any features you innovate in order to stay out ahead. In the old days incumbents like IBM weren't capable of innovating rapidly. But this is the Internet era, and the successful giants have become much more nimble. If there's a feature you have that the incumbent doesn't have, expect him to have it soon. The way to win today is to change the game -- to do something so novel, so different in a fundamental way, that the incumbent could not match you without re-writing his entire product from scratch and ditching his entire current customer base. In short, competing based on features is a fool's game in today's era unless you're the incumbent. The way to win is to change the paradigm, not attempt to compete on features within an existing one.

5. Yes, selling "private SaaS" means we basically end up having to support multiple versions. But that is true regardless, unless we want to force customers into a death march to new versions. Some customers are comfortable with that, the majority, however, arrive at a version they like and just want to stick with that, much as the majority of Windows users are still using Windows XP, or the majority of Linux users are using Red Hat Enterprise Linux 5 (basically a three-year-old version of Linux) rather than the latest and greatest Fedora or Ubuntu. They'll accept security fixes, but that's it -- if you attempt to death march them, they'll go to a competitor who won't.

I've been dealing with satisfying customer requirements for around 15 years now, and my actual experience of those 15 years is that customers are ornery folks where what they say and what they actually want are two different things, and your job as an architect and designer is to try to suss out what they actually want, which is often *not* the same as what they say. Young engineers tend to take customers at face value, then not understand why the customer rejects the result as not meeting his needs. I get called conservative sometimes because I call for slowing down release cycles, maintaining backward compatibility wherever feasible, extensive trials with customers prior to product release, etc., but my experience is that this is what customers want -- as vs. what they say they want, which is something else entirely.

For the record - my phone is an iPhone, and the only paper maps in my car are detailed 1:24000 USGS topographical maps not available on current GPS units with reasonable screen sizes. Just sayin' ;).

-EG

Friday, February 19, 2010

Is there such a thing as "open source management"?

The Open Source advocates have been talking about how we could apply "open source management" to things other than open source projects. But the question is, does such a thing exist?And my answer is... no. Open Source projects which do not have strong leadership fail. Commercial projects which do not have strong leadership fail. There is no "open source management" in the end, because people are people and software is software. I've been in both situations -- Open Source and commercial -- and software development is software development, in the end.

Any successful software development project of any scale other than a one-off one-person utility has some sort of leadership hierarchy where various people are in charge of various parts of the project and where there's a mechanism to insure that only high-quality code that complies with the general architectural vision of the project makes it into the project. Projects which do not develop this sort of leadership hierarchy fail -- they devolve into squabbles, or their architecture degenerates into such a mess that the project can't be successfully completed without a total re-write from scratch and a reboot. And if the quality of the people who make it into positions of being in charge is low, the project fails too, because the code base turns into a mess of buffer overflows, memory leaks, and unreadable/unfixable spaghetti code and the Object Hierarchy From Heck (the one that has 20 different levels of inheritance to do the simplest tasks, each of which reaches into the internals of its parent class to tweak something or another that it shouldn't be tweaking).

Whether you call these people "managers", "gatekeepers", "leaders", whatever, software development is software development and leadership is leadership. If you have good leadership, your project succeeds. If you don't, it fails, or is so late to market and such a low-performing mess that you might as well don't bother. That's how it's always been, whether Open Source or commercial is irrelevant. The only real difference is that Open Source contributors won't put up with pure BS as is typical in huge corporations. But that sort of BS is not typical in the small startup environment either, which shares a lot in common with Open Source.

-ELG

And now for a photo of the Linux Penguin Command and Control Center...

Sunday, September 13, 2009

Language wars!

This one gets all the flames when development teams meet to decide on what language to use for the next project. The crusty old Unix guy in the corner says "C, it was good enough for Dennis and Ken, it's good enough for us." The Microsofty says "C++ of course. C is for Neanderthals." The J2E guy says, "Why does anybody want to use those antiquated languages full of stack smash attacks and buffer overflows anyhow? Write once, run everywhere!" And finally, the Python/Ruby guy says, "look, I can write a program all by my lonesome in two weeks that would take months for a whole team to write in Java, why is there even any question?"

And all of them are wrong, and here's why: They're talking about technologies, when they should be talking about the problem that needs solving and the business realities behind that.

I'll put myself pretty firmly in the Python/Ruby camp. My first Python product hit the market in 2000, and is still being sold today. My most recent project was also written primarily in Python. I also had a released product written largely in Ruby, albeit that was supposed to be only a proof-of-concept for the Java-based final version (but the proof of concept shipped as the real product, huh!). Still, none of these products are in Python or Ruby because of language wars, and indeed these products also included major portions written in "C" that did things like encryption, fast network file transfer, fast clustered network database transactions, and so forth. Rather, the language chosen was chosen because it was the right tool for the job. The first product was basically a dozen smaller projects written in "C" with a Python glue layer around them to provide a good user interface and database tracking of items. The second product, the one prototyped in Ruby, was prototyped in Ruby because Ruby and Java are quite similar in many conceptual ways (both allow only single inheritance, both have a similar network object execution model, etc.) and it made sense to prototype a quicky proof-of-concept in a language similar to what the final product would be written in. The last project was written in Python because Python provided easy-to-use XML and XML-RPC libraries that made the project much quicker to get to market, but also included major "C" components written as Unix programs.

So, how do you choose what language to use? Here's how NOT to choose:

The latest fad is XYZ, so we will use XYZ.
ABC programmers are cheap on the market, so we'll use ABC.
DEF is the fastest language and is the native language of our OS, so we'll use DEF.
I just read about this cool language GHI in an airline magazine...

Rather, common sense business logic needs to be used:

We need to get a product out the door as quickly as possible to beat the competition to market.
We need tools that support rapid development of object and component-oriented programs that are easy to modify according to what user feedback says future versions will need .
Performance must be adequate on the most common platforms our customers will be using
Whatever tools we use must allow a reasonable user experience.

The reality is that if you're trying to get a product to market as quickly as possible, you want to use the highest level language that'll support what you're trying to do. Don't write Python code when you can describe something in XML. Don't write Java code where Python will do the job. Don't write C++ code where Java will do the job. Don't write "C" code where C++ will do the job. Don't write assembly code where "C" will do the job. In short, try to push things to the highest level language possible, and don't be ashamed to mix code. I once worked on a product that had Ruby, Java, and "C" components according to what was needed for a particular part of the product set. There were places where Ruby lacked functionality and its performance was too poor to handle the job, for example, but Java would do the job just fine. And there was places where absolute performance was needed or where we were interfacing to low-level features of the Linux operating system where we went straight to "C", either as a network component accepting connections and data in a specified format, or as a JNI or Ruby equivalent.

The whole point is to get product out the door in a timely manner. If you decide, "I will write everything in 'C' because it is the native language of Unix and my product will be smaller and faster", you can get a product out the door... in three years, long after your competition's product hits the market and gains traction. At that point you'll be just an also-ran. What you have to do is get product out the door, get it out the door as quickly as possible with the necessary functionality and features and performance (not theoretical best, but "good enough"), and then work on getting traction in the marketplace. Perfection is the enemy of good enough, and seeking perfection often doesn't even produce a product that's any closer to perfection than the product originally written to be "good enough". That program that took three years to write? That company went bankrupt, and one reason was because the constant search for perfection ended up with a product that was inflexible, difficult to modify to work for different problem sets, and, frankly, poorly architected. Seeking the smallest memory footprint and theoretical best performance resulted in a product that failed in the marketplace because they missed the main reason we're writing programs in the first place: To meet customer needs. A program whose architecture is about memory footprint and performance at all costs is unlikely to have an architecture capable of being changed to meet changing customer needs, so not only were they late to market -- their product sucked. And the hilarious thing is, they didn't even manage to achieve the good performance their head architect claimed would happen with their architecture! So much for the three year "write everything in C" plan... late to market, poor performance, hard to modify, and they claim that those of us who advocate "good enough" rapid application development in the highest-level language feasible are "sloppy" and "advocating non-optimal practices"? Heh!

Next up... I talk about tools, and the odd story of how I got into the Linux industry in the first place. Note - I was the guy against using Linux in our shop. But that's a tale for another post :).

Confessions of a Linux Penguin

Friday, January 30, 2015

User friendly

Monday, November 5, 2012

The importance of strong product management

Monday, August 9, 2010

Action items

Sunday, August 8, 2010

Architectural decisions

Monday, April 5, 2010

SaaS and the Dot-com Set

Friday, February 19, 2010

Is there such a thing as "open source management"?

Sunday, September 13, 2009

Language wars!

About Me

Pages

My Links

Blog Archive

Geek Links

Followers