Joost's Dev Blog: What most young programmers need to learn

Sunday 4 January 2015

What most young programmers need to learn

In the past 7.5 years I have supervised over a dozen programming interns at Ronimo and have seen hundreds of portfolios of students and graduates. In almost all of those I saw the same things that they needed to learn. One might expect that I think they need to learn specific techniques, algorithms or math, or other forms of specific knowledge. And of course they do, but in my opinion that is never the main thing. The main thing they need to learn is self discipline. The discipline to always write the clearest code you can, the discipline to refactor code if it becomes muddy through changes later in development, the discipline to remove unused code and add comments.

Most of the time I spend supervising programming interns is spent on these topics. Not on explaining advanced technologies or the details of our engine, but on making them write better code in general. I always ask applicants what they think is important in being a good programmer and they usually answer that code should be clear, understandable and maintainable. That is indeed what I want to hear, but it is very rare for a young programmer to actually consistently follow through with that.

Keeping this in mind requires self discipline, because it means not stopping "when it works". If all the variables would have the wrong name the code could still function perfectly, but the code would be super confusing. The step from functional code to clear code brings very little reward in the short term: it worked already and after cleaning it up it still works. That is why discipline is required to take this step. That is also why doing an internship is so useful: a good supervisor is vigilant on code quality (even though the definition of "good code" might of course differ per programmer) and thus forces the intern or junior to always take that next step.

Let me give a few examples of the kinds of things I often see in code written by starting programmers:

Liar functions/variables/classes

These are functions, classes or variables that do something else than their name suggests. Their name is a lie. It is very obvious that names should be correct, but to my surprise it is quite common for names to be completely off.

An example I recently encountered in code written by a former intern was two classes: EditorGUI and EditorObjectCreatorGUI. This is code that handles the interface in our editors. To my surprise it turned out that the code that handled the button for creating new objects was in EditorGUI, while EditorObjectCreatorGUI only handled navigating through different objects. The exact opposite of what the naming suggests! Even though the code was relatively simple, it took me quite a while to understand it, simply because I started with a completely wrong assumption based on the class names. The solution in this case is really simple: rename EditorObjectCreatorGUI to EditorObjectNavigationGUI and it is already much, much more understandable.

This is something I see a lot: names that are simply incorrect. I think this often happens because code evolves while working on it. When the name was chosen it might have been correct, but by the time the code was finished it had become wrong. The trick is to constantly keep naming in mind. You have to always wonder whether what you are adding still fits the name of the function or class.

Muddy classes

Another problem I see is muddy classes: classes that do a lot of unrelated things. Again this is something that happens as you keep working on the same code. New features are added in the easiest spots and at some point classes become bloated with all kinds of unrelated behaviour. Sometimes the bloating is not even in the size of the classes: a class might be only a few hundred lines but still contain code that does not belong there.

An example of how this can happen is if for some reason a GUI class needs to analyse what textures are available (maybe because there is a button to select a texture). If the GUI class is the only class that needs the results of this analysis, then it makes sense to do that in the GUI class. However, then some totally unrelated gameplay class for some reason also needs that info. So you pass the GUI class to that gameplay class to query the texture information. At this point the GUI class has grown to be something more: it is also the TextureAnalyser class. The solution is simple: split off the TextureAnalyser class into a separate class that can be used by both the GUI class and the gameplay class.

The general rule of thumb to avoid this problem is to always wonder: does the functionality that I am adding here still fit the name of the class? If not, then the class either needs to be renamed, or it needs to be split into separate classes or the code needs to go into a different class.

It is usually a Bad Smell if you cannot come up with a fitting name for your class. If you cannot describe what a class does in its name, then maybe what it does is too muddy. It might need to be split into parts that make more sense and can actually be described with a proper name.

Oversized classes

This one is really similar to the muddy classes above: over time more and more is added to a class and it gets bloated. In this case however it all still makes sense to be in one class, but the class simply grows too big. Gigantic classes are cumbersome to work with. Bugs slip in easily as there is a lot of code manipulating the same private member variables, so there are a lot of details one can easily overlook.

Splitting a class that has grown too big is quite boring work. It can also be a challenge if the code in the class is highly intertwined. Add to this that it already works and that fixing it adds no new functionality. The result is again that it requires serious self discipline to split a class whenever it becomes too big.

As a general rule of thumb at Ronimo we try to keep classes below 500 lines and functions below 50 lines. Sometimes this is just not feasible or sensible, but in general whenever a class or function grows beyond that we look for ways to refactor and split it into smaller, more manageable pieces. (This makes me curious: where do you draw the line? Let me know in the comments!)

Code in comments

Almost all sample code that applicants send us contains pieces of code that have been commented out, without any information on why. Is this broken code that needs to be fixed? Old code that has been replaced? Why is that code there? When asked applicants are usually well aware that commented-out-code is confusing, but somehow they almost always have it in their code.

Parallel logic and code duplication

Another problem that I often see occurring is to have similar logic in several spots.

For example, maybe the name of a texture gives some information as to what it is intended for, like “TreeBackground.dds”. To know whether a texture can be used for a tree we check the filename to see whether it starts with the word “Tree”. Maybe with the SDK being used we can check that really quickly by just using filename.beginsWith(”Tree”). This code is so short that if we need it in various spots, we can just paste it there. Of course this is code duplication and everyone knows that code duplication should be avoided, but if the code being duplicated is so short, then it is tempting to just copy it instead. The problem we face here is obvious: maybe later the way we check whether a texture is fit for a tree changes. We then need to apply shotgun surgery and fix each spot separately.

A general rule of thumb here is that if code is very specific, then it should not be copied but put in a function. Even if it is super short and calling a function requires more code than doing it directly.

All of the things discussed in this blogpost are really obvious. Most of these things are even taught in first year at university. The challenge is to make the step from knowing them to actually spending the time to always follow through with them, to always keep them in mind. This is why the most important thing that all programming interns learn at Ronimo is not knowledge, but self discipline.

80 comments:

Joris van Leeuwen4 January 2015 at 20:56
Really nice post, thank you! In Unity3d C# we at Little Chicken try to keep our classes below 400 lines of code.. no real conventions for methods, but they should fit in a landscaped monitor in my opinion ^^
ReplyDelete
Replies
Roger5 January 2015 at 09:50
Thank you for this. As a programmer who started full-time six months ago, this resonated so much with me. The transition was hard (and still is). Projects become more and more bloated and deadlines come closer and closer. I've been rushing and refactoring at the end when I should be doing it throughout my workflow. Self discipline sums it up very well. Correct does not mean done.
ReplyDelete
Replies
Mike5 January 2015 at 11:12
Anyone beginning to write professional code should read "Clean Code" by C. Martin. It would make all code so much simpler.

I've actually written a blog post recently about clean code. It's targeted to beginners and advanced programmers as well.

http://yourcodesucksexception.blogspot.com/2014/11/your-code-sucks.html

ReplyDelete
Replies
Luke5 January 2015 at 14:02
You didn't mention testing. Tests first, wild implementation later. If we don't emphasise the importance of this we're going to be dealing with a lot of untested code handling our private information in the future.
ReplyDelete
Replies
Anonymous5 January 2015 at 14:17
Thanks for this very good post. I especially like your insight that "knowledge" is not the the problem, but "self discipline". This did not occurred to me so far but I think you're right.

I made similar observations and came up with the "10 commandments" :-)
https://larsxschneider.github.io/2013/08/25/ten-commandments/
ReplyDelete
Replies
Anonymous5 January 2015 at 15:28
You're forgetting thorough input validation. Otherwise regardless of your layout and functions, your program will do something else than it was intended to do.
ReplyDelete
Replies
Anonymous5 January 2015 at 15:49
Regarding code duplication, you could use symbolic names (not sure what they are called) so long as the scheme stays the same, ie.

Define a name for the substring you check against and declare it's actual value just once, if you simply change your scheme from Tree... to Foliage... you can then change what substring you're checking against.

Of course this approach doesn't work if you change from startswith to endswith, but it's up to the programmer to declare such a change valid and/or necessary.
ReplyDelete
Replies
Andre Claassen5 January 2015 at 15:59
Often, young programmers startet with fixing bugs. They are anxious to change the code of their "masters". They want to fix the bugs with minimal changes and impact. They are socialized and rewarded for not refactoring.

Over time, the code becomes messy.
ReplyDelete
Replies
James King5 January 2015 at 16:47
How timely, I've been preparing a talk on this very subject. The reason behind this phenomenon, I think, is not a fault of inexperience but our lack of ability to teach programming well. Programmers, young and old, have a difficult time recognizing what good code looks like... much as writers once did (and still do when they're just starting out). The solution that William Burroughs took up in teaching good writing is to teach comprehensive reading in the hopes that students could recognize and internalize good writing. The same could be true for programming as well.
ReplyDelete
Replies
Anonymous5 January 2015 at 16:50
"As a general rule of thumb at Ronimo we try to keep classes below 500 lines and functions below 50 lines."

It depends what language/environment you're using, but these limits seem pretty big to me. On a 15" laptop (not an ideal working environment but a common one), it means you won't be able to see more than one function at a time.

Besides, unless you're writing in a low-level language, 50 lines is a lot of work -- it's hard to imagine a single un-decomposible function needing that many lines.
ReplyDelete
Replies
Eric P.5 January 2015 at 16:55
One screen height is my absolute maximum for functions. Classes are a bit harder to pin down to a maximum other than "is this code still relevant".

Honorable mention for this list: code organization. This applies to both the source code itself (arrangement of methods, etc) and project folder/file structure. It's much easier to debug/grok someone else's code if it reads as you would expect without a lot of guessing/jumping around.
ReplyDelete
Replies
Anonymous5 January 2015 at 18:24
"All of the things discussed in this blogpost are really obvious. Most of these things are even taught in first year at university."

Aren't these statements contradictory? People don't go to universities to be taught things which are obvious. I studied computer science at an American university (which is consistently rated in the top 10 in the world) and none of these was ever taught to me.

I think these things fall into an intersection of "courtesy" and "memory". A lot of programmers have a mind for remembering and dealing with 1000 tiny details all at once. A lot of programmers haven't worked with other people enough to know that other people can't. Together, these mean that many programmers simply don't appreciate that they need to document their work, not lie with their naming, write functions that are readable, etc.

Unfortunately, at the organizational level, this is a latch. Once an organization is infected with extremely technical people who don't think of others, it's virtually impossible for anyone different to join the organization. But if an organization begins with people who write clean code, it's very easy for people to join who will mess it up.
ReplyDelete
Replies
Shane Curcuru5 January 2015 at 19:20
Excellent advice - for new programmers and old ones alike. While there are millions of "how to code" pointers out there, it's always useful to see one that has real life examples, both of how you got to that point, and why it's important to step beyond the "it finally works" checkin and checking in the updates to make it right.

I humbly suggest your next blog post: How entry level engineers can work productively with management to ensure these steps get recognized and instilled in corporate culture. That's the really hard part - both for engineers as well as managers.

Thanks!
ReplyDelete
Replies
Unknown5 January 2015 at 19:34
You never stop getting supriced :-)

After this "new" knowledge, how do you handle new recruits? If so, did it have any affect?

Great post anyway.
ReplyDelete
Replies
Priom5 January 2015 at 19:48
Helpful post. Self discipline is indeed one of the greatest skills for us who belong to IT sector. Recently I read 'Applying UML and Patterns' by Craig Larman, who emphasizes some of the points you mentioned through his GRASP and the Design Patterns of OOP.
ReplyDelete
Replies
Anonymous5 January 2015 at 20:57
Take a look at just about any Git repo, written by anyone between the ages of 5 and 95. And soon you will realize that these young kids are lacking these disciplines because there's a whole lot of older programmers out their that are failing to teach them these disciplines!
ReplyDelete
Replies
Edwin6 January 2015 at 00:34
Really neat post, unfortunately I recognize myself quite a bit in this as a student getting closer and closer to graduation. Most of these points get taught in university but rarely put to the test. Code that has been commented out and class size are probably the things from this list that get checked the most often, but your view on code duplication isn't even taught that strictly at all.

Your post has actually gotten me pretty interested in doing my final internship at Ronimo.
ReplyDelete
Replies
Anonymous6 January 2015 at 10:24
Personally I tend to tolerate much larger functions than that, if they are clearly divisible. For example, I have some functions in my personal codebase that are over a thousand lines long- except it's 30 individual pieces that don't interact with each other (registering many related event handlers where each handler is totally independent).
ReplyDelete
Replies
Unknown6 January 2015 at 10:39
Really good advice, Joost van Dongen. I myself am an (almost) 14 year old game developer, and the Parallel logic and code duplication part in your post really makes sense. Since I work alone, I don't really keep my code that clean, but now since I have started selling Unity3D assets, it is forcing me to maintain coding conventions and clean code, which I think is a good thing ;)
ReplyDelete
Replies
Anonymous6 January 2015 at 14:09
I'm not young but I'm starting out in coding. I am not naturally good at math or physics, will this be a big problem for me to start a career in coding or software testing? Thanks.
ReplyDelete
Replies
Unknown6 January 2015 at 14:13
It is good to document standard coding practices with examples and pass it to all Junior - Senior programmers.so they will verify with their code like a checklist. this practice of verifying will make them more self disciplined.
ReplyDelete
Replies
Anonymous6 January 2015 at 14:32
I m very bad at programming.
It would be great if you guide me till i can write good programs on my own!
ReplyDelete
Replies
Anonymous6 January 2015 at 15:34
I agree with this topic ,I am Sr programmer and all the programmers , including Experienced programmers do not have self discipline another words put rules on yourself, there are patterns to find and sometimes those patterns are hard to find or see..in time code becomes more separated (concerns) more flexible, robust and maintainable
and 80 percent of the programmers I come involved with don't understand the concept of core libraries (System level and business level libraries) they have code all over the place and keep building on top of bad code
ReplyDelete
Replies
Anonymous6 January 2015 at 15:55
Joost:

This is spot on. If I ever get the chance to teach at university again, I believe I need to make the discipline aspect a major part of the lessons.

The worst example I ever saw came from code my manager's manager wrote several years before I inherited the code. Unfortunately, he was a EE with little formal software engineering training. He implemented what was actually a state machine using if-elseif-elseif-else constructs over 2000 lines. My manager later inherited the code and ended up getting a couple of very bad performance reviews because the code just simply could not be fixed without refactoring and simplifying.

Your limits of 500 and 50 are reasonable. The number is a little arbitrary, but the description is correct. If you have to scroll back and forth through the file endlessly to try to see things, then it is way too large. This is a human factors issue. The human brain can only handle what it sees all at once. Once it has to do several operations, you loose the effectiveness. In the "old days" we basically set the limit to what could be printed on an 8.5x11 piece of paper (turns out to be about 50 lines). Today that translates to what you can see on a screen in landscape mode.
ReplyDelete
Replies
Bearvarine6 January 2015 at 16:15
This is hilarious. Of course you are spot on in all your points. And most of the commenters agree more or less. But reality is quite different at most places I've worked in the past 20+ years. I have people even now that argue vehemently for "no break no fix" (if it isn't broken, don't fix it). The best way for a new programmer to get chewed out or dressed down is to make any kind of "unnecessary" code changes. I wish i was joking but I'm not. Ask around, many if not most places follow this atrocious policy.
ReplyDelete
Replies
Unknown6 January 2015 at 17:59
"The step from functional code to clear code brings very little reward in the short term". This is very correct. In fact sometimes refactoring can stop the original code from working and this brings more work but it will prevent code rot in the long term.
ReplyDelete
Replies
Anonymous6 January 2015 at 20:03
I'm very surprised to hear these comments, especially from seasoned programmers. 500 lines in a class only - really. I've worked on classes that were huge. Very well organized, very well written, but very large doing very specific business logic routines. Sure we could have thousands of little classes running around. Sooner or later that becomes a major problem. We have all seen the little single function call shoved into a class that should have been put together in one utility class. Writing clean good code is truly something to aspire to and most seasoned programmers will try their best to do it. Time, pressure, reality of job demands, may make code less than ideal for new programmers and seasoned ones alike. Personally my entire 25+ years coding has been in the business area. No game programming. You would not believe the incredible amount of pressure that can be put on the programmers to get something out the door because a marketer or sales person promised something that realistically could not/should not have been done. I’ve seen this so many times now in the electronic area, medical field, warehousing field, you name it, that it’s ridiculous. It would be great to take as much time as necessary to make the code very clean, elegant, and fast. It’s a serious balancing act.
ReplyDelete
Replies
Bob Stine6 January 2015 at 20:16
Good article!! One minor nit: when you write: " Their name is a lie. It is very obvious that names should be correct, but to my surprise it is quite uncommon for names to be completely off," I believe you mean "quite common" rather than "quite uncommon".
ReplyDelete
Replies
Chris Jacobi6 January 2015 at 21:16
Nice post I agree. Nevertheless, I would like to excuse two "bad" habits:

Old commented out code can stay until I'm convinced the new code has proven to be better, until it is not needed anymore, or until I forgot why its there (whatever happens first). Hopefully it gets removed fast.

Size is not an issue. Clearness is. Function-size becomes an issue only when clearness suffers, or function-headers are scrolled out of view. For modules, coherency matters and not the number of functions and certainly not the size.
ReplyDelete
Replies
Anonymous7 January 2015 at 01:56
If code is commented out, it is not needed anymore. Therefore, you're safe to delete it. If you want that code back at a later time, use source control.
ReplyDelete
Replies
sandeep7 January 2015 at 02:35
any of that, but learn new things and be different,
ReplyDelete
Replies
Unknown7 January 2015 at 12:48
Am one of those young programmers interesting article. Check out an application 've been building on and tell me what you think http://mbithy.blogspot.com/2014/12/music-sounds-better-if-you-made-player.html
ReplyDelete
Replies
Dale P7 January 2015 at 17:03
Overall, let me say I agree with this article but rather than self-discipline, I think it is a question of habit. Is that the same as self-discipline? Perhaps; it depends on point of view, I suppose.

I always teach developers who have problems with writing good code to follow Fowler's Refactoring. Proper refactoring, as Fowler describes, allows developers to be productive while learning to be better OO programmers. Over time, you have to refactor much less because you think to yourself, "If I type this this way now, I just have to refactor it in 5 minutes." You soon find yourself stopping the bad code and writing so you don't have to refactor in the first place. Keep writing code but keep improving the code you already wrote and the code you're writing now by practicing and repeated refactoring until you find you're refactoring less and less.
ReplyDelete
Replies
Anonymous9 January 2015 at 19:24
This is a really good blog and I did send it to one of the interns on my team. Before I had even seen this blog, I basically told him the same thing about what it takes to be a good coder/developer. Consistency is the main thing and I've read a log of code in my career as much as producing code for others to maintain. The thing I noticed most is reading through a lot of crap that makes no sense or are very hard to figure out. Good code is very easy to follow and does not require very much thought process to figure out. When code is formulated, I learned a long time ago from one very intelligent senior developer (while I was a junior), that a line of code should be a singular execution. If a line of code contains multiple execution, it requires the mind to wander and the person to figure out what that line of execution is doing. Also, time and time over the years, you hear others expressing simplicity in implemenation. How true can that be, but it is hard to do than it sounds. To keep things simple, you definitely have to do just that - keep each line of code simple and self documenting. You can document the hell out of the code you want, but if the line of code still doesn't make sense and requires a huge amount of comment, it means that the code should be re-written to make it more simple.

As for huge classes, I really don't agree on limitation on the size of any code. Like what the blog has stated, it is impossible to enforce the size of a function and class length to 50 lines or less for methods or 500 lines for classes. The best ways to look at it is to make things simple and make sure a method does only one task and not multiple tasks. The method can be unit tested this way. The unit of code can be a big as you want or as small as you want; and the smallest unit of code is a one line of code, which can be tested. For classes, an intern or a junior developer is likely not going to be able to do this very well or at all in terms of true object orientation. This takes time and experience until that concept is shown over the course of several projects. This is because OO's practicality is transpaarent in a framework and not developed by anyone other than the architect or a senior tech lead. I've seen in so many projects that even senior developers often use static global implementations rather than true OO for code re-usability. This is a bad approach as it is simple and direct; however, over time, you see similar code popping up in different files across the whole system.

I totally agree with developing a good sense of coding with consistency through self discipline and good mentoring. It is important for any senior developers to train the intern and junior developers early and make sure coding standards are adhered at all times. It would be totally wrong to either through the person to the fire immediately leave the person to learn things on his/her own. It is also wrong to also ignore the intern and give that person nothing to do during the whole term. I just want to mention these is because I know some companies do that kind of crap and it makes no sense except to claim research grant money from the government.
ReplyDelete
Replies
Spacemoses12 January 2015 at 13:46
To further clarify your point on "Code in comments", I would suggest that a remedy for that situation is version control. Go ahead and kill that code, you can always go back and see what was there before.
ReplyDelete
Replies
it3 March 2015 at 10:25
It is obvious that proper , clear and simple comments make us understand any code clearly
ReplyDelete
Replies
RESTful9 September 2015 at 17:50
Most of the juniors I met doesn't know about OOP apart from definition of class, object, abstraction ... even though they know definition they don't understand it, forget about using it on their code. We teach them these basic things by asking them write games like Tetris, Ping Pong etc
ReplyDelete
Replies

Add comment