Code Inspection - Alibaba DevOps Practice Guide Part 12

This article is from Alibaba DevOps Practice Guide written by Alibaba Cloud Yunxiao Team

As businesses evolve and teams expand, software scales and call chains become more complex. If there is no good code inspection mechanism and only functional verification is used, the team will gain more technical debts. The Development Team will spend a lot of time and energy finding and modifying code defects. Finally, the iteration progress and cooperation efficiency are dragged down, and even serious security problems are caused.

Based on the observation of problems exposed in the industry in recent years and Alibaba's long-term internal R&D experience, we can reduce common defects through the rational use of code inspection and analysis tools. In addition, the code inspection tool can help developers quickly locate and correct code defects and help code design personnel focus on analyzing and solving code design defects. They can also reduce the time spent on manual code inspection, improve software reliability, and save development costs.

Based on these advantages, the code inspection tools can be integrated into the continuous integration system to expose hidden code errors and defects in the pre-process in advance. We can ensure the quality and maintainability of the project code by configuring the control process.

Problems

In the daily R&D process, the problems we often face with code assets are mainly divided into two categories: code quality problems and code security vulnerabilities.

Code Quality

Code quality is a common topic. The problem is that everyone knows it is very important, but they do not know how to improve and maintain this common property of the team. On the one hand, developers may neglect quality control for the timely release of functions. On the other hand, developers have different coding habits and program understanding styles. In the long run, code quality may deteriorate as a result of increasing business pressure, which causes a decrease in development efficiency. As a result, the business pressure increases in return, leading to a vicious circle.

In practice, we use code review, integration testing, code inspection, code specification, and other methods to ensure code quality and maintainability and ensure the efficient collaboration of the Development Team from a process perspective.

Code Security

Security issues are often hidden in coding logic that lacks security awareness and open-source dependent components that are not detected or maintained. It is difficult to detect security issues in daily development and code review.

We can analyze code security issues in the following two aspects:

Coding security issues, which refer to security norm issues: The risks of privacy data leakage, injection, and security policy vulnerabilities can be reduced by preventing non-compliant code from entering the code libraries.
Dependency security issues, which refer to the security vulnerabilities introduced by dependent third-party open-source components: More than 99% of organizations use open-source technologies, according to the Synopsys 2020 open-source security report. Open-source components have many advantages, such as technical communication, collaboration with basis, reduction of development costs, acceleration of iteration, and improvement of software quality. However, outside of the convenience, they also bring a large number of security risks. According to audits, 75% of code libraries have security vulnerabilities, and 49% contain high-risk problems. In addition, 82% of code libraries are still using outdated components developed four years ago.

The access check is also required for code security issues. Configure the security code specification check and control based on the business scenario and specifications. Regular maintenance is also needed to detect and fix new security vulnerabilities.

Solutions

Code Quality Check

Java Code Compliance Check

In the practice process of Alibaba, various organizations have differences in engineering structure, code styles, and specifications due to the differences in historical barriers and business styles. As a result, the communication costs are high, the cooperation efficiency is low, and the maintenance costs are also high. Currently, Alibaba Group needs a professional technical force for iterative and intensive development rather than repeated works. A professional team must have a unified development protocol representing efficiency, resonance, feelings, and sustainability.

Based on the background above, Alibaba formulated the Alibaba Java Development Manual, which is a development specification for Java engineers within Alibaba. The manual covers programming compliance, unit testing, exception log, MySQL, engineering, and safety, summarizing of the experience of nearly 10,000 Alibaba Java technical experts. It has gone through many large-scale frontline practices, tests, and improvements.

It seems the formulation of traffic regulations is designed to restrict the right to travel, but it is designed to protect the public's safety. Imagine if there was no speed limit, no traffic lights, and no designed lanes specifying the direction of traffic flow; who would dare to walk on the sidewalk? Similarly, for software, the specification is not developed to eliminate the creativity and elegance of the code content but to restrict excessive personalization, implement relative standardization, and work together in a generally recognized way.

Therefore, the objectives of the code specification are listed below:

Coding with Efficiency: Unified standards improve communication and R&D efficiency.
Coding with Quality: Take preventive measures, improve quality awareness and system maintainability, and reduce the failure rate
Coding with Feelings: Craftsmanship spirit for pursuing the ultimate spirit of excellence and producing the fine code

Code compliance is deeply adopted to various development activities at Alibaba through IDE inspection plug-ins, pipeline integration testing, code review integration, and other tools. Java code inspection is also integrated into Codeup, Yunxiao's code management platform. Developers can check the code during the submission and review phases more conveniently.

Intelligence Code Patch Recommendation

Defect detection and patch recommendation have been a challenge in the software engineering field for decades. They are also one of the concerns of researchers and frontline developers. The defects mentioned here are not network vulnerabilities or system defects but defects hidden in the code. The software quality can be improved by helping developers identify and fix these defects.

Based on popular defect detection methods in the industry and academia to analyze and avoid their limitations, algorithm engineers at Alibaba Codeup proposed a new algorithm. The algorithm achieves more precise and efficient code defect analysis and recommends optimization solutions. It has been included by the International Conference on Software Engineering (ICSE).

Find the fix-type commit based on the keyword in the commit message and only take the commit with less than five files involved. (The fix operation may be diluted for the commit involving too many files.): This step relies on the good commit habits of developers. We hope developers can use commit well and write messages well.
Collect deleted and added content at the file level from these fix-type commits, which are Defect and Patch pairs (DP Pair): There is inevitably a lot of noise in this step.
The improved DBSCAN method is used to cluster buggy and patch pairs at the same time to gather similar defects and patch codes. (Segment-level clustering can also be performed.): Similar defects and repairs are clustered to reduce a large amount of noise left in the previous step. In addition, common mistakes in the historical code commits can serve as references.
The exclusive template extraction method is used to summarize the defect code and patch code and adapt the context according to different variables.

The code patch recommendation service is currently applied in automatic code scanning scenarios for merged requests. During code review, this service detects and optimizes code snippets, provides optimization suggestions, and accumulates experience from historical reviews to improve the quality of enterprise code continuously.

Code Security Check

Sensitive Information Detection

In recent years, sensitive information (such as API Key, Database credential, and OAuth token) has been leaked unconsciously in the industry, which has brought security risks and direct economic losses to enterprises.

We have also faced similar problems in our practice. The hardcode problem appears frequently, and we lack effective identification mechanisms. Therefore, developers and enterprise managers need a stable and sound method and system to detect sensitive information urgently. Through research, we know that most of the existing sensitive information detection tools use rule matching or information entropy technology, making their recall rate or accuracy difficult to meet expectations. Therefore, based on rule matching and information entropy technology, combined with context semantics, we propose a sensitive information detection tool called SecretRadar, which adopts a multi-layer detection model.

The technical implementation of SecretRadar is mainly divided into three layers. The first layer uses the traditional sensitive information identification technology, rule matching. It has good accuracy and scalability but relies heavily on relatively fixed lengths, prefixes, and variable names. It is difficult to cope with the different coding styles of different developers and may ignore some exceptions. We adopt an information entropy algorithm at the second layer in scenarios where it is difficult to match fixed rules. The information entropy algorithm is used to measure the confusion of code lines and has a good recognition effect on randomly generated keys and random identity information recognition. However, the information entropy algorithm also has its limitations, and the recall rate increases while the false alerts increase. Therefore, we adopt methods for filtering and optimization at the third layer, such as template clustering and context semantic analysis. They can aggregate and extract common keywords according to information entropy results and combine context semantics and current grammar structure to improve the accuracy of the model.

The sensitive information detection tool serves our internal developers and supports more than 20,000 code libraries and 3,000 enterprises on the Yunxiao platform. It helps developers solve more than 90,000 hardcode problems.

Source Code Vulnerability Detection

Alibaba uses a source umbrella detection engine called Sourcebrella Pinpoint to detect source code vulnerabilities, mainly involving injection and security policy risks.

The source umbrella detection engine is the technical research result of ten years of work from the Prism Research Group of The Hong Kong University of Science and Technology. The engine has used the research results of software verification technology for nearly ten years. It has improved, innovated, independently designed, and implemented a set of software verification systems with advanced technology. Its main verification method translates the programming language into mathematical expressions, such as first-order logic and linear algebra, and rationalizes the causes of defects through formal verification technology. So far, four papers on core technologies have been published, including one PLDI paper and three ICSE papers. You can read these papers by clicking the links below:

The source umbrella detection engine can find defects that have been hidden for more than ten years in active large open-source projects. Let's take MySQL detection [5] as an example. These defects cannot be scanned by other inspection tools on the market. This engine can complete the inspection of projects with 2 million lines of code in 1.5 hours. The false positive rate can be controlled at about 15% while maintaining the high efficiency of scanning. The source umbrella detection engine also takes the lead in the industry in terms of scanning efficiency and false positive rate for complex and huge analysis projects.

Source code vulnerability detection integrates the security analysis capabilities of the source umbrella detection engine and guarantees balanced analysis accuracy, speed, and depth. It has the following core advantages:

Support the Analysis of Bytecode: The code logic of second-party and third-party libraries would not be omitted.
Adept at the logic analysis of cross-function long call chains
Handle indirect data modifications caused by references and pointers
High Accuracy: It offers better accuracy and effective problem identification than similar tools, such as Clang and Infer.
Good Performance: The analysis of a single application can be completed in about five minutes on average.

The source umbrella detection engine can accurately track the data flow in the code and has high-depth and high-precision analysis capabilities for function call chains. It can find deep problems spanning multi-layer functions. When defects are discovered, the fault can be triggered, and the related control flow and data flow can be displayed to help developers understand and fix faults. In the early stage of software development, the quality of software can be improved at a lower cost, the production cost can be reduced, and the R&D efficiency can be improved.

Dependent Package Vulnerability Detection

We expect to establish an effective detection and management mechanism for developers based on the security reliability of open-source components. Therefore, we have implemented the detection service for dependent package vulnerabilities and security issue reports of dependent packages. In practice, developers generally report that the cost of fixing vulnerabilities in dependent packages is higher than fixing their coding vulnerabilities. Therefore, they are unwilling to deal with such problems. On the one hand, most vulnerabilities are not introduced directly. Instead, the dependent third-party component depends on other components indirectly. On the other hand, it is uncertain which version is clean, available, and compatible.

We identified and analyzed the reference relationship of dependencies, identified direct and indirect dependencies, and located sources files of specific dependent packages to make fixing simpler for developers. This way, developers can quickly find the location of key problems. At the same time, the vulnerability data is aggregated to intelligently recommend the version upgrade suggestion to fix the vulnerability because one dependency may have multiple vulnerability problems. Developers can evaluate whether the suggestion can be adopted. They can measure the cost of version upgrades by analyzing the API changes and code call chains between different versions. They can also create fix reviews for developers automatically and help developers improve the efficiency of code security maintenance as much as possible.

Detection Service Application

Code Submission

The detection service can be directly used in code submission scenarios, where enterprises can develop and configure check plans for different projects based on business scenarios and specifications. When developers push code changes to the server, the detection service configured for the current code library is triggered automatically. It can check all problems in the current commit version for developers, helping developers find new problems as early as possible and confirm the resolution of existing problems. This service allows you to left-shift tests from multiple dimensions, such as code specification, code quality, and code security. Developers can quickly detect and offer feedback as soon as they finish coding.

Code Review

In enterprise project collaboration, developers use merged requests to merge feature branch code into the main branch. The merged request process requires code review and manual inspection by the project development owner or module owner. The manual review requires a lot of energy, but it is difficult to cover the potential problems in each dimension of the code. Therefore, we can reduce the manual review workload and accelerate code review by configuring the detection service correctly. Based on a wide range of filtering and accumulation of detection rule sets and human experience, the detection service assists the control in scenarios closely related to businesses, preventing unqualified and risky code from entering enterprise code libraries.

Code Evaluation

The detection service helps developers identify and solve problems as early as possible during code submission and review. It also allows managers to evaluate the quality of enterprise code and visualize risks. The enterprise-level reporting service and project task management provide a more intuitive view of security and quality issues during project evolution.

Community

Code Inspection - Alibaba DevOps Practice Guide Part 12

Problems

Code Quality

Code Security

Solutions

Code Quality Check

Java Code Compliance Check

Intelligence Code Patch Recommendation

Code Security Check

Sensitive Information Detection

Source Code Vulnerability Detection

Dependent Package Vulnerability Detection

Detection Service Application

Code Submission

Code Review

Code Evaluation

Read previous post:

Read next post:

Alibaba Cloud Community

You may also like

Comments

Alibaba Cloud Community

Related Products

DevOps Solution

Alibaba Cloud Flow

Alibaba Mail

Enterprise IT Governance Solution