
Why Open Source + Legacy Modernization Tools
In daily software development, we often encounter a series of problems, such as:
- How to solve the problem of insufficient human intelligence? Patterns, Principles, and Tools
- Who should fix the code? code
- …
Automated tools, as some call them, are one of the solutions to these issues. Underpinning these tools is a set of principles and patterns that are incorporated the tools. Another solution to human growth is meta-meta, which is another story.
Legacy systems are the norm. In the consulting team, most of the systems we encounter are legacy systems. When we come to a new project, we may need to quickly analyze them to provide insights – write a PPT report. Therefore, in the past few years, the consulting team has also accumulated a series of legacy system analysis and reconstruction tools, such as Xinge’s Tequila, the open-source architecture analysis and guard tool ArchGuard, and so on.
Technology is passionate about generating electricity. For ThoughtWorks, the biggest challenge in writing tools is not technical, but copyright: is your tool inspired by customers? However, here we mainly develop and write code analysis tools, and there is basically no problem in this regard. In addition to this, another challenge is that you need to spend your spare time perfecting the tool. You can’t do this with the client’s billable time.
Since you have to develop on your own time, and it has nothing to do with the project, this kind of power generation with love is the most appropriate way to use open source.
What kind of tools do we need?
From the results for using the tool, we need this modern tool to be:
- Visually driven. Quickly generate project analysis results, display them to developers to understand the status quo, and write PPT.
- necessary interactivity. Used to find suitable entry points in the process of refactoring.
- Customized development.
- Certain a bad taste. Different development teams have different bad smells, and some bad smells cannot be identified by a tool like Sonarqube.
- Automate refactoring. Automatically refactor the code based on known bad smells and corresponding code location information.
- Appropriate grammatical precision. Higher grammatical precision means higher development costs, and they need to be balanced in a targeted manner.
- Multiplatform. We use macOS, and most of the time, customers use Windows.
How do develop such a tool?
The legacy system modernization tools defined here include the following parts: syntax analysis, results and visualization, automated refactoring, and architecture guarding.
Parsing
The code is parsed to generate language-specific data structures. Commonly used tools are: Antlr, Ctags, TreeSitter, Doxygen, CodeQuery, etc. A rough comparison (ordered by the head) is shown in the following table:
tool | Accuracy | Development difficulty | Cost of learning across languages | Add new language cost | Can be refactored automatically |
language compiler | Perfect | Low | high | – | Yes |
Antlr | extremely high | middle | middle | middle | Yes |
Ctags | middle | Low | Low | high | Yes (high cost) |
TreeSitter | high | high | middle | high | Yes (high cost) |
Doxygen | middle | Low | Low | high | No |
CodeQuery | high | middle | middle | high | Yes |
Results and visualization
Typically, we visualize legacy systems for the following reasons:
- Numericalize. For example, to automate refactoring for a specific smell, similar to Sonarqube, common patterns and principles are derived from the book Refactoring. In Coca, the bad smell of tests seen in some papers, such as tests without assertions, was also introduced.
- Visualize dependencies. For example, visualizing the dependencies of classes, packages, etc. in the code is mainly used to analyze the layered architecture. Commonly used tools are PlantUML, Graphviz, D3.js, Echarts, etc.
- Code properties visualization. For example, by visualizing attributes such as the modification frequency and size of the file, it is possible to obtain the file change frequency in unit time. If a file is modified frequently and is referenced a lot, it means that it is an unstable class or file. In addition to business changes, it is most likely that the design is unreasonable.
- other.
Automate refactoring
This step is optional and depends on our scenario. Often, writing such functionality mostly makes up for things that modern IDEs can’t do, such as:
- Unused class removal across multiple repositories.
- Clustering across multiple codebases.
- Refactoring for CSS colors.
Architecture Guardian
Write the rules for guarding the architecture to guard the architecture of the system. The tools used are ArchUnit, ArchGuard, etc. After referring to the syntax of ArchUnit, we also designed a multi-language architecture guarding tool: Guarding.
In the following section, we’ll go through each of these four sections in depth. In the process of developing these tools, they also kept pushing me to learn more about the things behind the language, such as the principles of compilation (the front-end part of the language), understanding the build system, etc.
Legacy Modernization Toolset
In order to modernize legacy systems in a more targeted manner, we recently created a new organization: Modernizing, which brings together a collection of previously developed tools. And created: awesome-modernization for a collection of other related tools.
- In Modernizing, tools for individual programming languages are:
- Tools for system refactoring, system migration, and system analysis in Java: Coca, Go language, GitHub stars: 691. Coca is a “full-featured” refactoring tool. It performs syntax analysis based on Antlr. In addition to conventional visualization and call analysis, it can also perform the automated refactoring. The origin of the name Coca is Tequila written by the new brother – tequila vs happy water.
- Analysis and automated refactoring tools for CSS/LESS/CSS: Lemonj , TypeScript language, GitHub stars: 128. The main purpose of the design at that time was: to extract the color in CSS, based on Antlr’s syntax tree analysis, which can be used for automatic reconstruction.
- Automate analysis for MySQL code, build UML from it, and generate its relational: SQLing , Go language, use PingCap’s SQL parser to parse. Of course, there is also an initialized PL/SQL-specific version: pling .
- Semi-automatic tool for Ant to Maven: Merry , Go language + Antlr.
- Front-end normalization transformation tool: Clij , used to add eslint, husky, lint-staged, etc. with one click, TypeScript language.
- For multilingual tools, we have:
- Antlr-based multi-language language model analysis tool: Chapi , Kotlin language. The original intention of its design is to generate the same data structure as Coca to access more visualization tools. In terms of syntax analysis, Antlr is used for analysis.
- Doxygen-based polyglot analysis and visualization tool: Go mod version of Tequila by Shingo. Among them, there are a series of mysterious codes that need to be refactored.
- Multilingual model analysis and visualization tool based on Ctags: Modeling, Rust language. Analyze source code and generate model-based visual dependencies.
- Multilingual architecture guard tool based on TreeSitter: Guarding, Rust language. The system architecture is guarded through a self-made DSL.
- A secret work-in-progress architecture daemon for a company’s new language: Menschen, Rust + a new language.
In addition, there is another open-source under the Inherd open-source group: Coco, which is mainly a tool to analyze the system through the physical properties of the code: modification frequency + directory + a number of lines. And ArchGuard, which is now in full swing in open source.
We use a range of different languages and tools to develop this software because there are different options for different scenarios.
Next step?
The existing tools are scattered, the data formats before different tools are different, and there is a lack of a unified data format. When the output format is not uniform, it is difficult for us to do standard visualization, such as when we are building codecity to visualize legacy systems in the metaverse, or the front-end visualization part that is being split from ArchGuard to use for reuse. Ideally, it should be like a pipeline-architecture system, consisting of a series of pipes and filters.
You may like these post
》Metaverse’s software manifest