Keyboard keys in a pile

Tabs ARE Better Than Spaces?

For a few years, I’ve been working with a team that prefers spaces to tabs. More specifically, 2 spaces. I agree that this is a superior visual arrangement and having everyone use it benefits those of us who prefer it, those who don’t care, and will eventually bring those who don’t like it to see the light. Today, however, for reasons that will be more obvious in future posts, it occurred to me that tabs may actually be superior.

This revelation requires first admitting that this is a preference. You may code on a 32” curved 16:9 or a 20” 4:3 rotated to 3:4 portrait. You could be writing code on a flip phone for all I know. Whether you use tabs or spaces, the desired effect is the same: to provide consistent formatting to make the code easier for you to read.

This leads me to my argument for tabs: Tabs are the democratic choice. Tabs put the details of the implementation in th hands of the consumer. Most modern IDEs allow you to customize your tabs. Want tabs to be 7 spaces? Let your crazy-big-odd-numbered-tab freak flag fly! It’s much easier to customize tabs or replace them with spaces than to replace spaces with tabs. So I’m sure this debate will rage on and I will happily continue to use spaces for my code, but for you, my friend, I will give you tabs.

brain art science

The Science and Art of Code Structure

Clearly, I should have been in game development. I’ve been coding since the third grade so software was always in the cards for me. I have a degree in Mechanical Engineering so I have the physics chops to handle pretty much any kind of real-world simulation from projectile motion and particle physics to fluid dynamics and even heat transfer. My masters degree is in Industrial Engineering so I know about things like optimization and simulation that are helpful for AI (among other things). And games are fun!

What I realized recently, is that writing “good” code is an Industrial Engineering optimization problem. You see, optimization problems deal with multiple dimensions with often multiple optimal solutions. Note the use of the word “optimal” and not “best”. Optimization problems often have multiple variables and an astronomical number of potential solutions. So when you are optimizing code, what is your goal? Maximize speed of execution? Minimize size of compiled code? Minimize bandwidth used? Minimize processing power used? Minimize total operating cost? Maximize test code coverage? Why not all of them? Well…I’ll tell you why.

One of the most famous examples of an optimization problem and how mind-bogglingly many solutions they could have is the “traveling salesman problem“. The traveling salesman must visit some number of potential clients and then return home. In the simplest version, he is flying from city to city and the only concern is the order of the flights to the various airports. A much more practical and complex version is a package delivery company with a fleet of trucks needing to deliver packages over routes through a complex network of highways and city streets with variable traffic, one way streets, traffic lights, etc. Every variable multiplies the number of solutions by the number of possible options for that variable.
The number of possible solutions very quickly gets large – like more than the total number of atoms in the known universe large. What hope do we have of solving these problems? Luckily, we have pretty cool brains.

Malcolm Gladwell wrote a book titled Blink: The Power of Thinking Without Thinking. It’s a great book and I highly recommend reading it, but the gist is our brains are able to take in a vast array of information and come to a conclusion (an “optimal solution”) without consciously thinking about it. This ability to organically problem solve was also recently discovered in single cell organisms. This is why we are pretty good at coming up with decent solutions to these problems without knowing anything about linear algebra. Our brain operates on “heuristics” that our life experiences have allowed our brains accept without having to think about them. In the world of software development, the use of these heuristics are the “art” and allow us to optimize variables that are extremely more challenging and seemingly abstract such as “maximize code reuse” and “maximize maintainability” and, as I like to say, “minimize astonishment“.

Another important consideration in optimization is boundaries. For example, you may be able to purchase a certain amount of computing resources for a certain cost. Additional resources add more cost. This means that you don’t necessarily need to use an absolute minimum number of resources, you just need to make sure that the amount of resources used doesn’t trip the threshold or that if it does, you have a source of revenue to cover the added cost.

In future posts, I will dive into more detail on the day-to-day practical advise on optimizing your code, but first I wanted to take some time to lay some groundwork. In the early days of computers, you were greatly constrained by memory and processing power. You had a limited number of pixels that could be rendered on the screen in a limited number of colors. You could only store so many bytes of code. You were limited to a small number of operations the processor performed. You were limited in the language you could code in and the structure of that language. The optimization required was much more science with a small number of options for each variable. Now, the options are astronomical. What language do you use? What platforms or devices run that code? What communication channels does it leverage? How many millions of colors and pixels does it render? Sure, it is still important to minimize operating cost (storage, CPU cycles, bandwidth, etc.), but you can also get access to a significant amount of resources for little or no cost. This allows you to focus on the art – maintainability and reuse – first, and then solve the challenging scientific problems later (or not at all).

horse zebra and pony

Look for Horses, Not Zebras

I have two quotes from old friends that have stuck in my head in certain situations.  The first is, “dancing is like standing still, only faster.”  As a runner, I have adapted that to, “running is like falling down, only slower.”  The second is, “when you hear hoofbeats, think of horses, not zebras.”  As a programmer, I have found there is not much better advice than this second quote.

You would think that the more experience you have, the more likely you are to look for the obvious answer.  I’ve found that the opposite is true – while you are going to make “stupid” mistakes less frequently, the majority of your mistakes are still going to be “stupid”. Unfortunately with great power comes great ego, and the most difficult bugs are caused by the occurrence of something that was “never going to happen”.  I happened to stumble through 3 of these self-induced hair-pulling-out exercises in a matter of days.  To make matters worse, I don’t really code at work anymore.  This was a hobby project that I can only work on for an hour or two each night.

First was the infinite loop (and subsequent stack overflow) – one of the few errors that can’t be captured with standard error logging techniques.  This problem only happened on the production server so instead of doing the obvious and duplicating the data down to dev so I could debug, I decided to add debug logging until I found the problem.  This was the first symptom of my bravado.  My application is a state machine “engine” that runs in a loop. As things happen and the state changes, the loop eventually completes.  I narrowed down the problem to the state where the error occurred but had trouble getting closer than that.  I was convinced that some dark magic was at play and some data anomaly so I spent a few nights adding debugging, publishing, running, repeating.

Finally, I stopped looking for zebras.  Instead of continuing to try to zero in on the line of code through trial and error addition of debug logging, I put 3 lines of code in the most obvious places the problem could occur – the 3 while loops that could get called inside my main loop. That immediately pointed me to the loop in question so then I threw an exception when a condition causing an infinite loop occurred.  It probably took me all of 5 minutes to determine the state that caused the problem.

So next I had to figure out how the code got into that state.  All of the information I needed to solve the rest of the problem, I saw within minutes – and completely ignored.  I saw that the state in question could only come from the database because the key data was never set anywhere by code.  Therefore, the code that allowed the infinite loop condition must have been in the code that loaded the data from the database.  However, rather than looking right where all indications pointed, I looked for zebras.  Since I knew how to find the problem, I could now duplicate it on my dev box and debug.  That meant I could spend time inspecting variables at breakpoints to see why the problem was happening.  After all, this infinite loop was caused by a recursive linked-list relationship – something that was “never going to happen”.  There couldn’t have been an obvious reason for something like this to happen so I had dig for a needle in the haystack instead of following the thread attached to the needle.  In spite of myself, I did eventually figure out that the problem could absolutely be solved when the data was loaded into the engine.  Technically this is after the data is extracted from the database, but the recursive situation doesn’t matter until it gets to the engine so that’s where I put the fix to truncate the recursive linked list since the duplicate reference shouldn’t have been there anyway.

Fast forward a few days.  With my problem solved, I was able to run my engine for a few days and then started to look at the output.  Then I stumbled on a new error – “An error occurred while reading from the store provider’s data reader. See the inner exception for details” and the inner exception was “Arithmetic overflow error for data type tinyint, value = 2288”.  So, the expert programmer has managed to break the Entity Framework.  Of course I checked the “obvious” problem – that my model had the wrong data type.  Then I went off the deep end.  Unfortunately, there are people who have had legitimate problems with EF or linq-to-SQL mapping parameters incorrectly (2288 was a parameter, not a database value).  I converted my linq expression into a SQL command and then realized I was once again looking for zebras.  Instead of looking at the columns in my database that would map to tinyint, I assumed the problem was in the Entity Framework – code written by Microsoft and used by thousands if not millions of people – and not in MY code.  When I stepped through my code and looked at the parameters, I noticed there was a parameter that should’t have even been there.  That was my problem – I passed the parameters into my method in the wrong order!  Instead of passing “int, null, int, true” I was passing “int, int, null, true”.  The 2nd and 3rd parameters were both nullable ints so as far as the compiler was concerned this was all good.  Unfortunately it meant it passed my very large primary key into a parameter that was expecting a much smaller value – something that should have been very obvious from the error.

This post has been sitting in draft for so long that I have now had several more “events”.  The most recent was while doing a demo on AngularJS directives with my team at work.  I was trying to illustrate scope isolation and no matter what I tried I couldn’t figure out why the attribute from the directive specified in my HTML wasn’t making its way to my link function in the directive code.  When I turned off scope isolation and pointed at the parent scope, everything worked fine.  Even my more Angular-savvy team members were stumped.  Then came the “duh” moment – I actually had 2 directives nested inside each other.  I was looking at the code in the inner directive and passing data into the outer directive.  The HTML of the inner directive wasn’t passing the value from the parent.  Yet another zebra chase.

So in conclusion, drop the attitude!  When you have problems, they are most likely the most basic and amateur issues.  Reign in those horses and let the zebras smack you in the head, don’t go looking for them.

UPDATE: 29 January 2019

I have done it again! I was working on a new framework feature and couldn’t get my test to work. I just KNEW it was a problem with the third-party software I was using. More specifically, I wasn’t sure what I was trying to do was even supported. Then somehow I managed to get everything to work directly with said software. After days of struggling, I discovered I was writing the final portion of my test immediately after testing deleting the exact thing I am trying to test. As usual, my code was doing EXACTLY what I told it to do.

On a side note, since the original post, I was introduced to the concept of rubber duck code reviews. I highly recommend this and your coworkers will thank me.