Python is a dynamic type language featuring strong type. Developers can dynamically specify the type (dynamic) for the object, but operations with mismatched types are not allowed. (For example, variables str and int cannot be added together.)
Dynamic types help developers write code easily. However, as the saying goes, using dynamic types is fun for a while, while refactoring is painful. Dynamic types also bring a lot of trouble. If dynamic languages can add static type markers, the main benefits are listed below:
Currently, most mainstream languages (such as Java, Go, and Rust) support static types. Dynamic languages (Python, JS) are also embracing static types, such as TypeScript.
This article introduces the support for static types from Python, the development status of the community, the introduction and comparison of type check tools, and the practice of type parsing.
As early as 2006, Python 3.0 introduced the syntax for the type annotation and listed many items for improvements.
# Before adding the type
def add(a, b):
return a + b
# After adding the type
def add(a:int, b:int) -> int:
return a + b
With continuous evolution, Python 3.5 can implement Type Hints. IDE can implement Type Checking by combining with Type labeling.
Then, in Python 3.7, the static type support is virtually perfect.
The following part is a detailed description of type inspection tools and some basic concepts.
Python authors and mainstream vendors have released the inspection tool for Python types.
The features of these tools are similar:
The earliest officially launched mypy was developed personally by Guido van Rossum, the father of Python. It is integrated by various mainstream editors (such as PyCharm, Emacs, Sublime Text, and VS Code). The user base is solid, and the documentation and experience are rich.
Google's pytype is capable of type inspection and provides some useful tools:
annotate-ast
: Mark the AST tree during the processmerge-pyi
: Merge the generated pyi files into the original file. It can also hide the type and load again during type inspection.pytd-tool
: Parse the pyi files into customized PYTD files in pytypepytype-single
: With all the dependent pyi files given, parse a single Python file.pyxref
: Generate cross-referencesFacebook's pyre-check has two special features:
Microsoft's pyright is the latest open-source tool. It claims the following advantages:
Why pytype? mypy is relatively old, and many functions are not useful. We plan to use Python LSP to process Python files and provide some syntax services. pyre-check uses Ocamel, so we use pytype based on the Python language to implement the desired functions. In addition, pytype provides some useful tools for parsing a pyi file and generating a pyi file based on the Python file type.
The letter "i" in pyi refers to interface, which stores the type definitions in Python files in pyi files in the form of an interface to assist the type check.
For commonly used Pycharm, go to External Libraries > Python 3.6 > Typeshed Stubs. There are many built-in pyi files to assist in the type indication and positioning during encoding.
The Typeshed Stubs mentioned above are equivalent to the pyi collection integrated in advance. pycharm seems to maintain a copy of the data itself. Many large open-source projects are also providing stubs, such as pyTorch. Tensorflow is also under consideration.
Many large Python libraries require a lot of work to create pyi, and there are also many API calls of C languages. We need to be patient.
We have viewed the source code of pytype and summarized the practical code with the requirements. The following part gives some examples:
Overall performance:
import logging
import sys
import os
import importlab.environment
import importlab.fs
import importlab.graph
import importlab.output
from importlab import parsepy
from sempy import util
from sempy import environment_util
from pytype.pyi import parser
In the demo, use Importlab to parse project dependencies and corresponding pyi files:
def main():
# Specify the directory for parsing
ROOT = '/path/to/demo_project'
# Specify the directory TYPESHED, which can be downloaded at:https://github.com/python/typeshed
TYPESHED_HOME = '/path/to/typeshed_home'
util.setup_logging()
# Load typeshed. If TYPESHED_HOME is not correctly configured, return None
typeshed = environment_util.initialize_typeshed_or_return_none(TYPESHED_HOME)
# Load valid files from the target directory
inputs = util.load_all_py_files(ROOT)
# Generate the environment for generating import_graph
env = environment_util.create_importlab_environment(inputs, typeshed)
# Generate import graph based on pyi files and engineering files
import_graph = importlab.graph.ImportGraph.create(env, inputs, trim=True)
# Print the dependency tree
logging.info('Source tree:\n%s', importlab.output.formatted_deps_list(import_graph))
# Alias of the import module, e.g. import numpy as np -> {'np': 'numpy'}
alias_map = {}
# Import the module name and mapping of pyi files, e.g. import os -> {'os': '/path/to/os/__init__.pyi'}
import_path_map = {}
# Value of alias_map, which corresponds to the key of import_path_map. The key of alias_map can be used to find the real implementation file.
for file_name in inputs:
# If found, it is marked as resolved.
# If Build_in dependency exists, skip and do not return results.
# If custom dependency exists, mark as unresolved for further parsing and positioning the engineering file.
(resolved, unresolved) = import_graph.get_file_deps(file_name)
for item in resolved:
item_name = item.replace('.pyi', '') \
.replace('.py', '') \
.replace('/__init__', '').split('/')[-1]
import_path_map[item_name] = item
for item in unresolved:
file_path = os.path.join(ROOT, item.new_name + '.py')
import_path_map[item.name] = file_path
import_stmts = parsepy.get_imports(file_name, env.python_version)
for import_stmt in import_stmts:
alias_map[import_stmt.new_name] = import_stmt.name
print('The import relationship obtained through importlab parsing is as follws\n\n')
# For code query scenarios, alias_map can associate with the introduced module through currently used object.
print('\n\n#################################\n\n')
print(' For code query scenarios, alias_map can associate with the introduced module through currently used object.')
print('alias_map: ', alias_map)
# For code supplement scenarios, parse current files and referred pyi files. If current files are __init__ files, conduct global search for all files in the directory.
print('\n\n#################################\n\n')
print(' For code supplement scenarios, parse current files and referred pyi files. If current files are __init__ files, conduct global search for all files in the directory.')
print('import_path_map: ', import_path_map)
print('\n\n\n By using pytype, parse AST of pyi files to analyze the returned types of third-party dependencies and find out the type of current variables.\n\n')
# Use pytype to parse dependent pyi files and obtain the return value of call methods
fname = '/path/to/parsed_file'
with open(fname, 'r') as reader:
lines = reader.readlines()
sourcecode = '\n'.join(lines)
ret = parser.parse_string(sourcecode, filename=fname, python_version=3)
constant_map = dict()
function_map = dict()
for key in import_path_map.keys():
v = import_path_map[key]
with open(v, 'r') as reader:
lines = reader.readlines()
src = '\n'.join(lines)
try:
res = parser.parse_pyi(src, v, key, 3)
except:
continue
# Alias
# Classes
for constant in res.constants:
constant_map[constant.name] = constant.type.name
for function in res.functions:
signatures = function.signatures
sig_list = []
for signature in signatures:
sig_list.append((signature.params, signature.return_type))
function_map[function.name] = sig_list
var_type_from_pyi_list = []
for alias in ret.aliases:
variable_name = alias.name
if alias.type is not None:
typename_in_source = alias.type.name
typename = typename_in_source
# Import case of alias and convert it
if '.' not in typename:
# If it is a common alias instead of return value of functions, ignore it
continue
if typename.split('.')[0] in alias_map:
real_module_name = alias_map[typename.split('.')[0]]
typename = real_module_name + typename[typename.index('.'):]
if typename in function_map:
possible_return_types = [item[1].name for item in function_map[typename]]
var_type_from_pyi_list.append((variable_name, possible_return_types))
if typename in constant_map:
possible_return_type = constant_map[typename]
var_type_from_pyi_list.append((variable_name, possible_return_type))
pass
print('\n\n#################################\n\n')
print('These are all return value types analyzed from pyi files.')
for item in var_type_from_pyi_list:
print('Variable name:', item[0], 'Return type:', item[1])
if __name__ == '__main__':
sys.exit(main())
The following code is parsed below:
# demo.py
import os as abcdefg
import re
from demo import utils
from demo import refs
cwd = abcdefg.getcwd()
support_version = abcdefg.supports_bytes_environ
pattern = re.compile(r'.*')
add_res = utils.add(1, 3)
mul_res = refs.multi(3, 5)
c = abs(1)
pytype takes advantage of Importlab, another open-source project of Google.
Place the files in the typeshed directory into the environment to analyze the dependencies between files. Then, Importlab can generate a dependency graph.
env = environment_util.create_importlab_environment(inputs, typeshed)
import_graph = importlab.graph.ImportGraph.create(env, inputs, trim=True)
# If any pyi file is found, mark as resolved.
# If Build_in dependency exists, skip and do not return the result.
# If custom dependency exists, mark as unresolved for further parsing and positioning the engineering file.
(resolved, unresolved) = import_graph.get_file_deps(file_name)
Through the import graph, we obtain the source of the variable (including the reference alias and the return value of the method call):
{'ast': 'ast', 'astpretty': 'astpretty', 'abcdefg': 'os', 're': 're', 'utils': 'demo.utils', 'refs': 'demo.refs', 'JsonRpcStreamReader': 'pyls_jsonrpc.streams.JsonRpcStreamReader'}
Through the dependency graph, we can know the location of dependencies of the direct reference:
import_path_map: {'ast': '/Users/zhangxindong/Desktop/search/code/sempy/sempy/typeshed/stdlib/ast.pyi', 'astpretty': '/Users/zhangxindong/Desktop/search/code/sempy/venv/lib/python3.9/site-packages/astpretty.py', 'os': '/Users/zhangxindong/Desktop/search/code/sempy/sempy/typeshed/stdlib/os/__init__.pyi', 're': '/Users/zhangxindong/Desktop/search/code/sempy/sempy/typeshed/stdlib/re.pyi', 'utils': '/Users/zhangxindong/Desktop/search/code/sempy/sempy/demo/utils.py', 'refs': '/Users/zhangxindong/Desktop/search/code/sempy/sempy/demo/refs/__init__.py', 'streams': '/Users/zhangxindong/Desktop/search/code/sempy/venv/lib/python3.9/site-packages/pyls_jsonrpc/streams.py'}
Next, parse the corresponding files. The requirement is to get the return value types of some methods. For pyi files, pytype can help us parse them. Then, we match them through the call relationship.
print('\n\n\n By using pytype, parse AST of pyi files to analyze the returned types of third-party dependencies and find out the type of current variables.\n\n')
# Use pytype to parse dependent pyi files and obtain the return value of call methods
fname = '/path/to/parsed_file'
with open(fname, 'r') as reader:
lines = reader.readlines()
sourcecode = '\n'.join(lines)
ret = parser.parse_string(sourcecode, filename=fname, python_version=3)
constant_map = dict()
function_map = dict()
for key in import_path_map.keys():
v = import_path_map[key]
with open(v, 'r') as reader:
lines = reader.readlines()
src = '\n'.join(lines)
try:
res = parser.parse_pyi(src, v, key, 3)
except:
continue
# Alias
# Classes
for constant in res.constants:
constant_map[constant.name] = constant.type.name
for function in res.functions:
signatures = function.signatures
sig_list = []
for signature in signatures:
sig_list.append((signature.params, signature.return_type))
function_map[function.name] = sig_list
var_type_from_pyi_list = []
for alias in ret.aliases:
variable_name = alias.name
if alias.type is not None:
typename_in_source = alias.type.name
typename = typename_in_source
# Import case of alias and convert it
if '.' not in typename:
# If it is a common alias instead of return value of functions, ignore it
continue
if typename.split('.')[0] in alias_map:
real_module_name = alias_map[typename.split('.')[0]]
typename = real_module_name + typename[typename.index('.'):]
if typename in function_map:
possible_return_types = [item[1].name for item in function_map[typename]]
# print('The possible return type of', typename_in_source, 'is', possible_return_types)
var_type_from_pyi_list.append((variable_name, possible_return_types))
if typename in constant_map:
possible_return_type = constant_map[typename]
var_type_from_pyi_list.append((variable_name, possible_return_type))
pass
For example:
pattern = re.compile(r'.*')
In the /Users/zhangxindong/Desktop/search/code/sempy/sempy/typeshed/stdlib/re.pyi
file, we load two methods. Both are re.compile, but the input parameters are different. The return values are all of the Pattern type.
Then, we know that the type of the pattern variable is re.Pattern.
Some features of Python syntax analysis have been applied for search and recommendation of code files and smart code supplement in Alibaba Cloud Dev Studio.
If developers do not know how to use an API (for example, the call method or method input parameter), they can move the pointer over the specified API for more information. Developers can view the API summary provided by the smart encoding plug-in. Developers can click "API documentation" to view the detailed information (such as the official API documentation and code samples) on the right bar. They can also search for the required API code document directly. Search and recommendation for code documents in JavaScript and Python are supported.
In the process of document collection, we can get the API name and the corresponding class of the API. In the code, we can get the corresponding class information based on the called method through syntax analysis and use the information for document search.
When writing code, the smart encoding plug-in perceives the code context automatically and provides developers with precise code supplement candidates. The one marked with ✨ is the result of smart code supplement. Currently, this feature is available for Java, JavaScript, and Python.
Through syntax analysis during the code supplement process, the class information of user variables can be known more accurately, which helps filter out unreasonable options recommended by deep learning models. Some reasonable options can be recalled based on the internal method set of the class.
The concepts and tools of Python static types are perfect. However, due to the heavy burden and the lack of driving force in the community, the results are limited. In addition, the official party, major vendors, and local IDEs have implementation and analysis methods with no unified standard or format. You can choose a suitable parsing method based on the preceding advantages and disadvantages and the tool set and data set. We look forward to more support from the Python community to static types.
How to Improve Code Review Efficiency: Try the Syntax Intelligence Service
Alibaba Cloud Native - April 2, 2024
jianzhang.yjz - July 9, 2021
Alibaba Cloud Native Community - July 18, 2024
Aliware - May 15, 2020
Alibaba Clouder - February 2, 2021
Yee - August 16, 2021
Explore Web Hosting solutions that can power your personal website or empower your online business.
Learn MoreExplore how our Web Hosting solutions help small and medium sized companies power their websites and online businesses.
Learn MoreBuild superapps and corresponding ecosystems on a full-stack platform
Learn MoreWeb App Service allows you to deploy, scale, adjust, and monitor applications in an easy, efficient, secure, and flexible manner.
Learn MoreMore Posts by Biexiang