Home
Codegex is a regular-expression-based approach for automated code review that uses several strategies to extract analysis contexts (syntax and type information) from program texts
Workflow

The above figure shows an overview of Codegex workflow. It has three steps: preprocessing, regex-based analyzing and PR comment generating.
- Given a diff patch in a PR (Pull Request), our tool first splits the text into program statements by terminators for Java programs (i.e.
;
,{
,}
), and ignores deletions and comments. - Then, the regex-based Analyzer applies keyword filtering and regex matching on each statement to find buggy code. The analyzer may also use analysis heuristics, diff search and online search, to improve the accuracy of some patterns, like UI INHERITANCE UNSAFE GETRESOURCE that gives a warning saying that the usage of
this.getClass() .getResource()
may be unsafe if this class is extended by a class in another package. Its output is a JSON file that records bug instances that includes pattern type, description, source information and priority. - Finally, the PR Comment generator sends requests to annotate code and leave code review comments using Github API.