bzztbomb

How to Navigate Large Codebases Quickly

To break up all of the high score posts, I’m just going to publish this old thing I never finished. I think it’s a good starting point for a good article or something. Every developer gets dumped into a large codebase they don’t know anything about at one point or another in their life. Torque has 1.3 million lines of code spread out over 2800 files, not including TorqueScript. The Windows XP tree is estimated to have 40 million lines of code.

How are you supposed to comprehend something this large?

The answer is simple. You don’t. It’s likely you’ll never know exactly how the whole system works. Even the people that wrote it will not know over time. The best you can do is create a model about how each of the pieces of the system work and correct this model as you debug and learn more about it. I asked a guy during an interview how he explores a new body of code. His response was that he liked to look at every file and function and see how it all works. Can you imagine trying to do that to Windows XP? You’d never have time to write new code!

How does one build a model of an application? I usually start by simply using it for a while. The goal is to grasp what the major bodies of functionality are. The next step is to pick a small modification and figure out how to implement it. Then you start using the tools below to figure out how to do it. The reason for this is to create a filter in which you can start examining the source code. Without it, you’re just staring at a huge monster with no weapons to attack it!

Grep and Find in Files. This is probably the best tool for doing this kind of exploring. Most of the other tools are just refinements of this. Just start searching for keywords that are similar to your modification. If you’re working on a GUI library and your modification is to create a “SuperButton”. Just do a grep for “Button” through the source tree. You’re likely to get waaay too many hits for a simple query like that. This is where you turn to regular expressions. Change your search to “class.*Button” to find out where the Button (or descendant class) are implemented, assuming C++ as the target language.

Go to definition. This is in Visual Studio and Eclipse. You can right click on any symbol and select “Go to definition”. This will take you where the class or function is defined. A shortcut for above. Get callers (Eclipse) , or Find All References (Visual Studio) This lets you see what uses a particular object/method. Useful to figure out code flow/structure and to figure out how large of an effort changing that object/method will be. Breakthings! I love just commenting out large bodies of code and watching the effects on the system. This is a great way to validate assumptions about what a body of code does. I just recently confirmed I was looking at the wrong piece of code by commenting it out and not seeing the effect I expected. Remember Project forces. The reasons behind why a particular piece of code was written are rarely purely techinical. Some pieces of the system may have been written under different time constraints, different goals that do not apply anymore, or just different coding philosophies at the time. Keep this in mind while mining through a new codebase, it will help understanding of the code (and also of the team that wrote it).