Understanding Code Visually: Three Ways that Work


Monday, November 8th, 2010 - By Seth Rosen

In trying to help developers understand code we have made sure to listen to what they have to say. We’ve noticed that many developers utilize UML style diagrams to try to understand code. However many problems exist with existing tools. I have tried using a number of UML tools and found that the reverse-engineering capabilities in the majority of them were not helpful for understanding tasks. They focus on designing tasks; reverse-engineering capabilities often just end up showing convoluted pictures and lots of useless information. Even developers that commonly use these tools ask us how to best understand code. Of course there are many solutions to this problem and the solution varies depending on the situation.

It is important to think about which specific perspectives you need to understand. Is the structure of the codebase in a state of disrepair and needs attention? Or are underlying code concepts, features, and logic the most important? There seem to be three main types of diagrams people use to help visually understand code.

The Mile High View: A layered architectural diagram can be really helpful to know how the main concepts in a project are related to one another. They provide a good starting point when other resources are unavailable. Examining the relationships between packages in a project can provide a good sense of where the most important code lies. They can also give insight into projects and packages that are poorly named or organized in a counterintuitive way.

A layered diagram showing how the different components of Apache Log4J are related. You can clearly see that the components varia, chainsaw,and jmx build on top of the core interfaces and logger class along with support utils

The Core: It is important to try to figure out how the code works with regards to the main concepts. Class diagrams are exceptionally useful here. Pen-and-paper works often enough, but tools can not only speed up the process but also help you save and share such diagrams. Inheritance relationships as well as class structure can become immediately apparent with a well constructed class diagram. By limiting the scope of your diagram to a specific concept you can easily avoid the problem of overwhelming, “everything but the kitchen sink” style diagrams.

A Class diagram showing the heirarchy model of Apache HttpClient. Specifically, the Http Methods package showing all the HttpRequest Classes. Notice that the POST and PUT requests inherit from HttpEntityEnclosingRequestBase where as GET and others do not.

Key Use Cases: By tracing at least one key use case for your app you can gain a great deal of insight. You likely can get the most important use cases from anyone on your team, and stepping through them will be really helpful. Most IDE’s provide helpful solutions for quickly navigating the source. This can often be helpful for simple logic but more complex scenarios may require sequence diagrams. Some sequence diagramming tools allow you to focus on the most important aspects of the logic you are interested in. Complex sequences of method calls, conditionals, and loops can often be reduced to an elegant diagram that will illuminate any potential bugs or unintended consequences of your code.

A sequence diagram describing the procedure for searching a Lucene 3.0 index. Notice that the collect method is inside an anonymous inner class (Collector) and is responsible for actually outputting the results of the search.

When originally confronted with the problem of understanding code visually I was left unsatisfied by the array of tools currently available. Many showed too much information or were difficult to use and were therefore unhelpful for understanding code. Some say that code is inherently unable to be visualized in a useful way. We at Architexa are intent on challenging this belief. We have attempted to address the problems inherent in understanding a complex codebase by creating an easy to use dynamic visualization tool. Let us know your thoughts.

 

19 Comments

  1. david croydon says:

    I think the main problem with OO is that it does not model time clearly: indeed it actually fractures the code so that every new ‘event’ has to be named and there is no necessary clue from the code what order the ‘names’ take place. There was, once, a diagram/design tool that solved this (Jackson Structured Programming) but for various reasons it has fallen by the wayside.

    • Seth Rosen says:

      David, It’s true that many diagrams fail to model time properly. It is always going to be a challenge to be able to visualize run-time information alongside structural/inheritance data. Perhaps my description of ‘Key Use Cases’ wasn’t clear enough, but I think sequence diagrams can play a very important role in visualizing the control flow of the code.

      We are also thinking about better ways to model event handling, I will be sure to research the approach you mentioned.

  2. HB says:

    Interesting article. We recently were imposed to document a system with class diagrams along with some descriptions of the main classes, after doing so the people that asked us for it told us they were not able to grasp the control flow anyway heh.

    • Seth Rosen says:

      HB, Even with the most carefully designed class diagrams control flow can be difficult to follow. I prefer to use a class diagram to show the main classes/methods involved in a concept and then use individual sequence diagrams to show the control flow. Please see my response to David for more insight into this problem

      • HB says:

        I know. What I meant is that we had to waste several hours into making some documents that were not going to solve the problem.

        In the end we had to make two more documents, explaining in more detail the control flow of the system, and made some simple samples, just to get a reply asking us to resume our work as the piece of the system to document was out of their knowledge area.

  3. Joe Klemmer says:

    Interesting.

    I like where you are going with this. There’s one thing that needs to be considered when doing visual representations of non-visual information. People process images in different ways depending on various psychological, physiological and educational variables. Personally, the “Mile High View” and “Key Use Cases” look pretty, but make little sense to me. “The Core,” however, jumps right out at me and instantly expresses the idea of what the code is supposed to do. The fact that you are making these different formats available (at least that is what it looks like) shows you have done allot of work on this issue.

    But are three visual representations enough? Are the end result images targeted for developers or non-developers? Should both audiences be covered in the first place? I am just a code monkey so I cannot really say. It is, as I said, interesting. Can’t wait to see where this goes.

    • Seth Rosen says:

      Joe, Thanks for the great response! We have been very concerned with the diagrams that are useful to people in different situations. I have just modified the captions in the post to provide a clearer picture of the benefits each diagram can provide.

      I have mainly been interested in the developers point of view, and this post reflects that. Non-developers may require slightly modified diagrams but we believe that simplified versions of any of the above diagrams with attached written documentation/comments are able to be used for many purposes.

  4. BobbyC says:

    Just a technical note. The image gallery gimmick you use is just not working for me. I don’t know if it’s my 1024×768 screen size or what, but I tried it on the latest version of Firefox, Chrome, and Opera. All of them clip off the left side of the wide pictures at maximum (actually readable) zoom.

    I think a big part of understanding visual code understanding is being able to see the visuals. 🙂

  5. Tanner says:

    I actually use a little of all three of these in my daily modeling needs and they work quite well.

    99% of the time I am using “Core” diagrams to express my thoughts and requirements and to make sure that nothing gets left out when designing software for our clients. Most of my work is piecemeal upgrades to existing systems though so this type of model tends to hang out a lot on my desk.

    When I run into random issues that can’t easily be debugged I use a type of “use case” diagram to record on paper the steps taken in an application, what database stuff gets updated, and how that affects other applications polling for the same data. Sometimes it works. Sometimes it doesn’t. My application’s users are not known until right before software delivery so I can’t do drive-by “use case” testing with them because a lot of time they are geographically quite far away.

    I need to do more mile-high view diagrams. They take a bit longer and actually require you understand a full system in extreme detail in order to create it correctly. However, if I did more of these types of diagrams then my amount of code refactor would most likely be much smaller because I could avoid design pitfalls ahead of time.

    • Vineet Sinha says:

      Tanner, This is exactly what I have been noticing as well.

      As for mile-high view diagrams: yes they can take longer, but they are incredibly useful to have. Every time I have gone through such a diagram I find interesting important facts of the code that I should have realized but did not because of just looking at the details. Additionally, as for them taking longer – you can find several tools that can help you with them.

  6. Randy Buchholz says:

    Good article and I think the most important takeaway is avoiding the kitchen sink. A problem with modeling is that there is little science to modeling; it’s 95% art. No matter how simple the model is and how good the user understands models you only get you so far, and it can be counter productive as Seth indicated. 100 modelers with the same training, modeling the same small system will produce 100 distinct models.

    As for time, I’ve always found state models to be one of the more useful diagrams. They sort of handle time, though not linearly. If time is important I find it best to abstract and use a simulation tool, though relating this 1-to-1 to software components can be dificult. Balance is very important in any modeling endevor.

    As for use cases, I’ve been doing modeling as my core job function for over 10yrs and still don’t understand why anyone whould use a use case except for the first meeting with a customer. Use cases have no context which is a big issue. How do you know when you have captured the scope of the business domain or when you are done? I think true process models are the best way to model business or business to technology information. Data flow diagrams are a very good substitute especially for data driven applications. In UML an activity diagram will sort of work.

    I agree tools are pretty weak for reversing, but unlike the old days they are really good for code generation if configured correctly. Most allow you to create your own generation templates and they make your code very consistant. You can also keep you hand-coded stuff inside the model objects and push it out from there. I keep EVERYTHING in the model – it is visual source control. The biggest problem I encounter is developers tend to resist learning to building or use models because “we are programmers, not modelers”.

    • Seth Rosen says:

      Randy, I can tell you’ve spent some time thinking about this and I agree completely with your points. We are working to create a product that will be much better than existing tools at converting code into a usable model. This will not only help in cases where the model is poorly designed or out of date, but also by making modeling a simple tasks that even programmers who are “not modelers” will not have an issue with.

    • Vineet Sinha says:

      Randy, I have had one thought consistently regarding the phrase “we are programmers, not modelers”. Lots of people have said one version or another regarding it, and I think the real story is them complaining not about modeling but about the limitations of modeling tools – that it is just faster to write code. After all, modelling is just another language that a programmer should be using and so does not have to worry about.

      I am hoping that improvements some of the more recent approaches for text based interfaces to modeling as well as some of the DSL work gets polished enough to see people benefiting from such approaches.

  7. Dave says:

    I *wish* your sequence diagrams showed the entire method call text instead of having to roll-over, but I’ll be darned if I can figure out how to get a screenshot like the one you have above.

    • Seth Rosen says:

      Dave, thanks for checking out the tool! We have purposely truncated the method calls in order to conserve space in the diagram. We found that showing all method names all the time can result in an overwhelming amount of information being shown and obscure important relationships. However, when a diagram is exported, saved as an image, shared, or emailed it will un-hide these elements as you see in the diagram in the article.

      If you think adding an option to show all the method calls/names when exploring the diagram would be useful to you please let us know. We prefer to track feature requests via our forum so we can gauge demand.

  8. edcarden says:

    Maybe it’s because I’m a DB guy and not a traditional OOP or Procedural programmer but to me the in effectiveness of the available tools comes from their somewhat flat based approach.

    If I want to explore the relationship of objects be the tables in a DB or classes in a traditional OOP app I want to do it in 3 dimensions, at least as much so as a 2 dimensional display can offer. I have for a long time wished for a developers version (and perhaps one exists that I am unaware of) of what the Marvel Comics website uses for the UNIVERSE site where connections between characters and events and such are visually represented so that you see not only their connections but how close or far away those connections are.

    As I said it may be that something like this exists for there are many, many tools out there.

    Bear with me as this is hard to articulate and not be verbose.

    IMHO I would like a tool that works like the navigation software in the TOM CRUISE movie MINORITY REPORT where he visually browses through and is able to go in and out of places and objects until he finds what he wants.

    Imaging reviewing the workings of an application in this way. At the start you open the app and see all the base objects at a summary level including the starting or entry point of the apps workings. You then can choose to walk through thr app as a user and see how information travels between objects optionally having the ability to pause it and look deeper into the object and examine its state at that time. At the first level each object has a black box like view where you see what goes in and what comes back out. You can also choose to go deeper (and rewind the action) and watch the objects inner workings at a closer level. This way if an object works as expected you aren’t overwhelmed with unnecessary info and yet if something is wrong you an optionally go deeper.

    For a DB guy like myself who spends a good bit of time trying to see how the app is using DB equivalents objects (UDFs, Stored Procs, ect) it would be a big help if I could see a simple association or connection between nested Stored procedure calls.

    I guess the best way to describe this would be to call it an interactive 3 dimensional debugger.

    In the end I doubt any one tool would be best for even a large number of the developer community simply because we all are different and so what works great for one person may do poorly for another.

    • Seth Rosen says:

      Great points Ed!

      That is exactly what we are trying to do at Architexa. Have you checked out what we are doing? Do drop us a note and let us know what you think.

      Beyond that, I often think for similar ideas. This for instance always gets my creative juices flowing. Having an IDE that incorporates different views of the code and that can also accommodate run time / debugging would be a great tool. However visualizations of code come with their own set of problems which we are working carefully to avoid. The future of visualizations and development sounds like a fun idea for an upcoming blog post.

  9. Jaizon Lubaton says:

    Wow Architexa at first use helped me solved one of my complext programming task. You only see the classes and methods that you need…

Leave a Reply