16 May Naming Things is Hard
“There are only two hard problems in Computer Science: cache invalidation and naming things.” – Phil Karlton
The above quote, while somewhat tongue in cheek, wears the ring of truth. Naming things is hard. Even when you put enormous effort into consistent and clear naming, you will invariably end up with some muddy and inconsistent caverns in your code. And when you are unfortunate enough to inherit code from someone (or multiple someones) who put no effort into their naming…You’re in for a world of pain. If you have been programming for any length of time you know what I am talking about.
Unfortunately, in addition to being difficult, naming things is Really Important. Everyone agrees that, sure, naming things properly is important. However, I get the impression that people feel clean naming is important to programming in the same way that a nice monitor is important. i.e. kind of important, but not something that will make-or-break a project. I disagree. I consider effective naming to be one of the pillars of quality software.
In the moment while you are developing software – when you are making naming decisions – is when those decisions are the least impactful. The architecture is in your head, the problem domain is fresh to your mind, and whatever weird algorithm you’re coding makes sense. However, when software enters the maintenance phase of the SDLC – the longest phase by far – this is when Naming Things becomes important. When somebody has to come behind and work with the complexity that any meaningful piece of software compels. Even if that somebody is you, the architecture will not be in your head, the problem domain may be a distant memory, and your weird algorithm will have turned into incomprehensible legacy babble that has been cludged 5 times. This is when a variable named “theNumber” really hurts you. Which number? Is it an index? Is it a semaphore? Is it the result of a computation? Is it a temporary variable for swapping? Questions like this cause a 5 minute fix to turn into 30. And a 4 hour minor feature to turn into 4 days.
There are many different naming conventions. People have opinions on singular vs. plural table names, HumpBackNotation vs Underscores_Separating_Words, etc. I have my own opinions on these, but what is important is this: These conventions need to be agreed upon and consistent within your organization. Even if you make a poor naming decision, it is better to have an awkward convention consistent across your software than two (or three, or more!) conventions across your software.
But why does it matter?
Let’s take a common convention: Singular vs. plural table names. Ignoring which is “better,” let’s look at the diagram below, and imagine that the various tables were created by different developers at different times, all making off-the-cuff decisions about naming.
At a glance, this schema doesn’t seem to be too bad. There’s consistency in the autonumber artificial primary keys, and the naming seems pretty consistent, right? Wrong. This schema is already a mess. WorkOrders is plural, while Project and Person are singular. Then to exacerbate the problem, the PersonWorkOrder table makes both singular (when only one is), and the PeopleProjects makes them both plural – when neither of them are. Going deeper, the WorkOrders table has a primary key that uses the singular “WorkOrderId,” while the PeopleProjects table keeps the plural in its primary key “PeopleProjectsId.” Because of this, PeopleProjects is even inconsistent within itself because it has one column where both People and Projects are plural, and two other columns where they are both singular. While this may seem like intentional misnaming, I assure you that I see FAR worse than this in legacy systems every day. I contend that it is beyond human ability to consistently remember these inconsistencies for any length of time. Humans process and remember via patterns and categorizing things. Your brain will simply refuse to store this inconsistency for any length of time. Sure, when you are deep in a project working on it every day you can get a hang of the inconsistent naming, but any time a new developer works on the project there is a huge amount of wasted time caused by the inconsistent naming. And once the software goes into maintenance mode, every developer that has to revisit the project has to re-experience the pain of the naming because it is so difficult to remember.
And this is just one convention, and the simplest most natural one at that. The situation becomes untenable when you are inconsistent with your API service names, your controller names, your method names, and your library names.
Are singular table names better than plural table names? I don’t know. Everybody has an opinion. But what I do know is that an organization has to make a decision and stick with it. What your convention is is not as important as that you have one, and that it is followed religiously.
I know that I just said the convention itself is not as important as whether it is followed consistently. I have one exception to this: Abbreviations. I believe abbreviations to be so destructive to software that they simply cannot be allowed. This is because it is impossible to be consistent with abbreviations. If you allow abbreviations in your software, invariably the same thing will be abbreviated multiple ways in the same piece of software even if the entire thing is written by one person. Bear in mind I’m not talking about common acronyms here such as SSN, DOB, or ZIP (yes, that’s an acronym). I am talking about abbreviating words that are never abbreviated in common usage to save typing. See the below:
Unlike the earlier schema, this is a schema I actually saw in the wild. I am recreating it from memory, so some of the details might be off, but it really was quite close to this. The table “Employee” is not abbreviated, but the mapping table to branches has it abbreviated “Emp.” (Unrelated to abbreviations, the table maps to branches but is named like it maps to Locations) Company is abbreviated as “Comp” in one table name, and “Co” in another. “Master” is not abbreviated in the company master table, but it is in the primary key “CoMstrId.” “Location” is not abbreviated in the table name EmpLocation, but it is abbreviated in the primary key “EmpLocId.” I swear this is a real schema. The crown jewel is the abbreviation “SocSecNum.” If only there were a commonly used acronym we could use instead…
I can tell you from first-hand experience with this schema and many others like it that it takes a massive toll on a developer’s efforts to understand the system. It leads to countless bugs – especially when using loosely-typed languages. It is so easy to type “CoMasterId” instead of “CoMstrId” or “EmpLoc” when you should have typed “EmpLocation.”
“Well,” one may say, “Just come up with a standard for what those abbreviations should be use and enforce consistency across the organization. Sure, that would work. But that’s a tremendous amount of effort. Are you really going to maintain a comprehensive dictionary of acceptable abbreviations for your systems? To what end? What is the benefit of all of that effort? To save typing? Typing is the least time consuming thing a developer does. Naming things poorly to save typing time is like making a house out of balsa wood because it is cheaper to transport than lumber.
tl;dr; Naming things is hard. Organizational consistency with naming conventions is more important than the conventions themselves. Abbreviations in naming make this consistency extraordinarily difficult. Don’t use them.