How to learn a new codebase

The first step you'll usually take after starting a new job in Software Engineering, is that you'll need to start learning their codebase.

Over time, a project can easily end up in the hundreds of thousands of lines of code. Even having millions of lines of code isn't uncommon. Learning such a large codebase could prove quite overwhelming. 

Whilst you won't need to learn an entire codebase, you will be expected to be able to work on the codebase. For example, at my previous job there were parts of the software I'd never worked on, so I didn't know the code but was comfortable enough to work on those areas if the need had arisen.

Reading the unit tests

Unit tests are beneficial for a number of reasons, it adds automated testing to ensure that the functions still run as they were originally intended to, but it also shows you the logic regarding the functions.

For example, given the following function:

and the following unit test:

We can see a lot of information about the add_nums() function.

From the first line in the test_add_nums() method, we know that add_nums() doesn't need to take any parameters in, and if it doesn't it will always return an integer of 35.

From the second and third lines, we know that we can override the a or b parameters quite easily by just passing them.

Then from lines 4 to 7 we know that add_nums() should return an integer, and never a string or a list.

Often times, you'll also find some hardcoded data within the unit tests that get passed through as params, so you can see what kind of data structures will typically get sent through, and what is the expected outcome.

 

Finding entrypoints

Try to find a random function and follow it back to where it's initially called. In my preferred IDE, PHPStorm, you can easily do this by Shift+Clicking the method. If that were to fail, a Find All and pasting the function name never hurts.

Keep repeating that process, for example, Function C could be called by Function B, which is called by Function A, which is then called by Function Main. 

Like so:

Typically it won't be neatly laid out like this, maybe another class calls the function so the function's caller would be in another file, but using this to follow your way around the codebase is great. It's also good when doing this to manipulate any data passed through, change a string for example.

A debugging tool would work really well for this, as you can check the stack trace. Throw a couple of breakpoints around and take notes when they're hit.

 

Ask for help

This is a great one that you should try to do often. Nobody is in the codebase on their own, so you should rely on your teammates and ask them for guidance, and ask as many questions as you can.

The second great point of this is that you get introduced to other developers very early on, and you can get used to relying on them, and being relied on.

 

There are many ways to learn a codebase, but these are the ones that I always rely on when I'm learning a new codebase, especially the point about Unit Tests. In an ideal world, documentation would be kept up to date and we could use the documentation, but I'm yet to encounter that experience.