In-Depth

Dev Disasters -- Why They Happen, How to Overcome Them

Learn lessons about your own code through the errors committed by other developers.

We've all had them -- that moment when something you've done causes your head to forcefully meet your hand in horror. Or maybe the facepalm was brought on by someone else's incomprehensible screwup. Whatever the reason, mistakes are something everyone makes. What separates the everyday developer from the truly outstanding one is not the mistake -- it's what's learned from the mistake, and whether those lessons are applied to future projects.

What follows are tales of woe from Developer Land: monumental messes caused by inexperience, carelessness, arrogance and other failings. Whatever the cause, the effect was a program that didn't do what it was intended to do. Read on, and see what you can learn from the errors of others.

Or you could just laugh, instead.

Employees Must Wash Hands -- ­ and Take Salary Lists
It was my first programming job, and my first assignment. I was tasked with a new payroll report: salaried employees, by descending dollar value. So, being my first assignment, after I completed the development, I took the report home to verify it before I released it.

On my way out of the building, I stopped in the men's room and ... wait for it ... left the report on the sink with every employee's salary on it. Boy, do I wish reports were digital back then.

-- John Wollner

It's Not Always the Code
One of my first tasks upon being hired was working on a C# program to move images in bulk from a folder structure into a content management system (CMS). It took me a while to learn the ropes of the CMS, but with the help of my mentoring senior programmer, we got everything configured and working. Some months later, it was time to run this image import for one of our clients. It was high-pressure and had to be done by the next day, but it wasn't a big deal; the import had been working flawlessly.

So we got a ZIP of the latest files to be imported from the client, and I started the import process. To my great disappointment I saw that the main pages of the site were now missing images. I tried running it again with the same result. I asked my mentor, who had worked on it with me, to come help. We searched through the code, refactoring anything that remotely looked like it could be causing the problem. We ran the import on a few images and it succeeded. We ran the import on the rest of the images and it failed.


We kept this up for hours, trying every possible refactor we could think of. We debugged through the program line-by-line and found nothing suspicious. The code looked perfect. At 7 p.m. we were sick of looking at it, so we decided to catch our trains home. My mentor said he'd work on it from home after taking a break for a while.

The next morning I had an e-mail from my mentor saying he had found the issue. I opened the e-mail, still having no idea what could've been wrong with our image importer. What had been wrong? The images themselves. More than half the images had been corrupted when they were zipped on the client's server. The importer had been working perfectly, moving the corrupted images into the CMS and causing the browser to show them as missing.

At first I thought, "What a lot of work refactoring and debugging for nothing." But my mentor said I had learned a valuable lesson: As a programmer, it's good to question your code first, but I had been so focused on the code that I didn't think something as simple as a corrupted ZIP file could be making the perfectly functional program look bad.

-- Zachary Marks

A Thickheaded Update
For years now, users at every retail location, warehouse and distribution center accessed apps through the corporate intranet using their network ID; single sign-on took it from there. The Web client communicated with the thick client on the back-end back at corporate, and it just worked -- that is, until it didn't. It was left up to Bill G., a developer assigned to supporting the application server, to fix it.

For reasons unknown to Bill, the newest release of the thick client now required every user of the Web client to have an e-mail address (in the form [email protected], according to the error message in the logs).

Could Bill roll back the patch? Nope. Restore to a previous backup? No way. Doing so would mean the loss of a week's worth of data. Pressing on was the only option.

So, Bill dutifully created fake e-mail addresses for the 17,000-plus employees who both needed Web access to this application and didn't already have or need an e-mail address (most locations just relied on a bulletin board for important notices). He uploaded the addresses to the application's Employee table, leaving the other 59 columns alone.

When he clicked Enable Web Access in the thick client, it crashed after a minute or so, citing an "out of resources" message from the database server. Because the database server was on a 24-core, 64GB system, "out of resources" was not something that Bill had seen a lot.

Out of options, Bill entered a critical service request with the vendor's support team. The response was quick, asking him to run a SQL trace to help get to the bottom of things. At this point, SQL Server Profiler though threw up its hands, saying "SQL Profiler trace skipped records," either because the server was too busy to push trace messages, or because there wasn't enough bandwidth for SQL Profiler to capture all of them.

So Bill decided to trace Enable Web Access for a smaller group: a single retail location with eight employees. It worked.

At this point, Bill's mind whirled with ideas of what could've caused the disaster. A cross-join against a large table, perhaps? Bad criteria, causing it to pull unnecessary records? Indiscriminate column selection, causing it to use too much memory? Data-type issues, causing needless implicit conversions?

"Why can't it be all of those?" Bill thought, after seeing what SQL Profiler found:

SELECT CAST( 1 AS INT ) [_value]
  WHERE ( ( EXISTS ( SELECT * 
                       FROM [Employee] 
                         CROSS JOIN [Employee] [_all] 
                       WHERE ( ( 
                                    ( [Employee].[EmployeeID]='0204771' )
                                 OR ( [Employee].[EmployeeID]='0205518' )
                                 OR ( [Employee].[EmployeeID]='0213388' )
                                 OR ( [Employee].[EmployeeID]='0227638' )
                                 OR ( [Employee].[EmployeeID]='0232147' )
                                 OR ( [Employee].[EmployeeID]='0269010' )
                                 OR ( [Employee].[EmployeeID]='0309576' )
                                 OR ( [Employee].[EmployeeID]='278302' )
                               )
                           AND ( [Employee].[EmployeeID] < >[_all].[EmployeeID] )
                           AND ( [Employee].[Email]  = [_all].[Email] )
                         )
               )
      )
    )

-- Mark Bowytz

The Case of the Micro(brained)manager
I typically get brought in to clean up dev disasters, so I've seen plenty of problems. One thing I usually find while fixing these applications is that there's plenty of blame to go around -- no one involved with an application is blameless. Some of the problems include:

  • Customers aren't able to articulate what they want.
  • Customers pay so little that the services provider can't make a profit, which takes all the motivation out of the work.
  • A services provider will underbid in hopes of building a relationship with a customer. I actually sat in a meeting where another company's representative said, "We quote a low price, then when the money runs out, they have to negotiate with us because they're stuck with us. Then we can get them on price."

But none of this -- or my then-19 years of experience -- prepared me for one particular job. I got a call one day from a startup that wanted us to build a mapping services application from scratch. I met with them, and our company was on board. I had reservations regarding their business model from the start, but felt like they wouldn't have gotten the kind of angel investor funding they did if they hadn't thoroughly reviewed the marketplace and projected they were going to be successful.

After getting a couple of weeks in, the subject of how to handle the various "Fort" names came up. For example, there shouldn't be a difference between Fort Worth, Texas, and Ft. Worth, Texas, and similar types of names.

The manager at the startup wanted to do a really complicated rules-based engine to perform decoding. I suggested a fairly simple database lookup that would be easy to implement and easy to update, as all we had to do was to update database entries. The response was, "No, we can't do that, it just won't work because ..." And some flimsy explanation followed. I remember looking at him, thinking, "Really, that's your best excuse?" A few hours later, I had this resolved my way.

Later on, a search question came up: "How should users search for places on their map?" I said that Google had already solved this problem, so we needed to implement a single textbox -- and that if we didn't emulate that, no one would ever use the app. But the manager didn't like the fact that Google would be prone to errors on the fourth or fifth page of results. He said we needed to do better.

A major argument ensued over this, which I lost. The result was that a really complicated UI with six options would be used by the user, and then a search would be performed. The users would always get back the correct data on the fifth or sixth page, but no one could figure out the UI.

There were other examples of situations like this: We'd ask questions, and the manager would try to tell us what our options were -- and then react negatively toward all the options. The end result was that important decisions were never made.

As you can well imagine, consumers were not receptive to this product. We tried to explain the problems, but the manager would never listen. Eventually, the startup's money ran out. We were asked to keep working, with the guarantee that eventually its investor would pay us. As of now, we've been waiting on that final check for five years -- I don't think we'll ever see it.

This experience caused me to formulate a set of guidelines for successful development:

  • Non-technical people aren't allowed to play programmer. Non-technical people aren't allowed to make technology decisions. I've since had this lesson reinforced.
  • Don't allow meetings to adjourn without decisions being made.
  • Listen to the end user. Before everything fell apart, the investor asked me to sit down with some users and see if I could figure out what they wanted. I spent a few days talking to users and found out that they wanted something similar to Foursquare -- only this was 18 months before Foursquare launched.

Throughout this nightmare, I was yelled at by the manager for not wanting to follow "his plan." That plan apparently involved ignoring user feedback. After all, how could those crazy users tell us anything about how to design a product?

-- Wallace McClure

The Quantum Invoice Bug
Like most in-house-written tools, the Initrode invoice tracking and management system had started out simple and lean -- but over time, it had grown beyond its original intent, morphing into a mash-up of Classic ASP, J#, C#, Visual Basic, ASP.NET and, of course, static HTML. So, when the announcement was made that time would be dedicated to a Code Cleanup Marathon, Andrew G., like everyone else on his team, was ecstatic. The system would be gutted and recreated in C# and ASP.NET. It would be a return to vanilla.

The general consensus was that the coming promotion was going to go well, but Andrew had one bug that resisted squashing.

After refactoring an old ASP.NET control to use Model-View-Presenter-style binding, QA reported an odd condition whereby a user browsing to the View My Invoices page would automatically delete their first uploaded invoice.

Andrew dubbed this bug "The Quantum Invoice Bug": The act of viewing the invoice caused it to no longer exist (Schrödinger would be proud).

Somehow, changing the order of the data binding was causing the QueryString of the current request to change, appending a "delete=123456" key-value pair to the end.

Andrew finally tracked the problem down with this code:

  public string GetEditUrl(string modeCode, string mode)
  {
    string sUrl = Request.RawUrl;
    string[] baseUrl = sUrl.Split('?');
    NameValueCollection queryString = Request.QueryString;
    // Reflect to readonly property
    PropertyInfo isreadonly = typeof(
      System.Collections.Specialized.NameValueCollection).
      GetProperty("IsReadOnly", 
      BindingFlags.Instance | BindingFlags.NonPublic);
    // Make collection editable
    isreadonly.SetValue(queryString, false, null);
    if (queryString.ToString().Contains("delete"))
    {
      // Remove query string parameter
      queryString.Remove("delete");
    }
    if (mode != "edit")
    {
      queryString.Add("delete", modeCode);
    }
    else
    {
      return "/invoicesupload.aspx?modeCode=" + modeCode + 
        "&navcode=98";
    }
    // Make collection readonly again
    isreadonly.SetValue(queryString, true, null);
    return baseUrl[0] + "?" + queryString;
  }

The function was supposed to build a URL based on the current URL, with various QueryString bits appended for various actions that could be invoked by clicking links on a page.

The System.Uri class was (more or less) immutable -- you would create a new one if you needed to alter the represented URI. However, someone figured out a way to circumvent the encapsulation in this case, in order to modify the current URI rather than building a new one.

In and of itself, this was a rather strange approach to programming. But the last couple of lines of code really took the cake.

After forcing the QueryString collection of the currently processing URL to be non-readonly, the readonly status was carefully restored after having been modified. Then the URL was converted to a string and returned from the function. The result was that whichever "action URL" was rendered last by the rendering logic became the action for the current request.

Everyone asked: "Was this part of the new code?" The answer, as Andrew came to learn, was "no" -- the code was installed into production almost six years earlier, and was skipped during review because it was already in the target language.


comments powered by Disqus

Featured

Subscribe on YouTube