Navigating an Unfamiliar Codebase

Getting up to speed on a large codebase is difficult, so this week we’ll focus on reading unfamiliar code. You will end up reading more code than you write throughout the course of your career, because in order to write good maintainable code, you’ll need to have a good understanding of how it integrates with the rest of the codebase. Reading code is a skill that you can learn and improve over time. It takes practice, but there are tools and techniques that you can leverage to build this skill.

It can be frustrating reading unfamiliar code because we didn’t write it ourselves. We usually have strong opinions about how it should have been written, whether it’s different coding styles, different naming conventions for variables, or different ways of solving the problem. What may not always be apparent when you’re reading code, however, is that there may have been constraints that the original author had to work around. They may have needed to solve the problem in a specific way because of other reasons you don’t know about. Unless the original author is still around for you to ask, it may be impossible to know why code was written a certain way.

Here are some tools and techniques that I’ve used successfully throughout my career in order to dig through codebases and understand not only what the code is doing, but also the context for why it was written.

Use source control to your advantage

  • I use git grep all the time. It’s just a wrapper around grep that also takes into account your .gitignore file. If I need to find where a variable, class name, or a string is used in the codebase, this is what I reach for first.

  • I use git blame whenever I’m reading through a file that I’m unfamiliar with. It helps me answer questions like “Who wrote this code?” and “When was this change made?” Knowing these things can help you find the right people to talk to or help you track down a bug due to recent changes to a file.

  • I use git log -p <filename> to show me a chronological history of all changes that have been made to a file. What were the most recent changes to this file and who made them? When were they made?

Use your IDE to your advantage

  • I prefer JetBrains IDEs because they have excellent support for indexing your codebase. This allows me to use features like “Go to definition” and “Find usages” to help me navigate the codebase.

  • If I come across a function call that I’m unfamiliar with, I can use “Go to definition” to jump straight to the location in the codebase where that function is defined, which lets me see the function signature and I can read through the logic of the function to see what it’s doing.

  • If I am looking at a function and want to know where it’s used in the codebase, I can use “Find usages” to show me all the locations in the codebase that call this function. This is helpful for refactoring and understanding where certain business logic is used in the codebase.

Find the entry points

Look for the main file where your application boots itself. It’s helpful to know how your program starts up and configures itself. If it’s a web application you’ll also want to look at the routes to see which endpoints or URLs are available.

Read the comments

This one seems like a no-brainer, but it’s easily overlooked. Sometimes I get caught up trying to figure out how the logic works, or why certain logic exists when there is a comment right above the code explaining it. Make sure you read the comments, but be careful. A good comment reveals enough context to give you a good understanding of why the code exists and why it was written a certain way, but comments can often cause more confusion if they’re not kept up to date as changes are made to the code. Use git blame to see if the code was modified more recently than the comment.

Read the tests

The tests provide a specification for how the code should work, and which edge cases were considered when the author wrote the code. This gives you an idea of how the function is expected to work, and in some cases how it should be used if you’re looking at integration tests.

Spend some time this week trying out these tools and techniques when you’re working on a new feature or tracking down a bug in the codebase. The more you use them, the quicker you’ll learn what works for you and you’ll find your own style for navigating large codebases.

🔗 Additional Reading

If you liked this post, subscribe to the newsletter on Substack for more great content just like this.

First published Monday Apr 25, 2022 by Dave Glassanos