The fourth issue of The Open-Source Folks Talk - Big Data & AI Special Session was held during the Apsara Conference 2022. Feng Wang (Vice Chairman of Alibaba Open-Source Committee's Big Data AI Field) and Xing Shi (Head of Alibaba Cloud AI Open-Source Project EasyRec) shared stories behind popular open-source projects. Many guests discussed hot topics and painful issues about open-source, including Hongshu (CTO and Founder of OSCHINA), Lidong Dai (Co-Founder of Beluga Open-Source), Junbo Zhao (Ph.D. Supervisor of Zhejiang University), Yipeng Wang (Editor-in-Chief of InfoQ), and Yu Li (member of Apache Software Foundation).
At the beginning of the activity, Feng Wang (Vice Chairman of Alibaba's Open-Source Committee in the Big Data AI Field) shared the theme of "In the cloud-native age, do not forget the initiative of open-source." He introduced the changes that have taken place in the way developers participate in open-source in the cloud-native era and introduced open-source projects that grow on the cloud. Among them, Apache Flink is the key project of Alibaba Cloud big data AI open-source. Feng Wang also highlighted. “Alibaba opens its technological innovation to the community to benefit more developers and expects to attract more developers to promote community development.”
(Speech by Feng Wang– Vice Chairman of Alibaba's Open-Source Committee in the Big Data AI Field)
Next, Xing Shi (Head of Alibaba Cloud's AI Open-Source Project EasyRec) shared the theme of "AI Inclusiveness, Ali AI's open-source history and thinking." Today, AI is ubiquitous in life and is continuously providing inclusive capabilities. The Ali AI open-source family provides AI open-source capabilities that connect the complete processes from scenario application to production and development.
Talking about the original intention of open-source, he mentioned that he wants to combine Alibaba's application on the scene on the basis of open-source to realize greater expansion and return the results to the open-source community. Therefore, we will do more continuous open-source work at the platform, algorithm, application, and resource scheduling levels so more developers can enjoy Alibaba's experience in practical scenarios. In addition, I hope that more developers can participate in the open-source community. Finally, we hope that more open-source products can be integrated with the cloud to make it practical, easy to use, and more secure to make AI open-source more inclusive in the digital world.
(Speech by Xing Shi – Head of Alibaba Cloud AI Open-Source Project EasyRec)
Finally, the host, Ruobing Li (Head of the Alibaba Cloud Developer Community), held a dialogue with Hongshu (Founder and CTO of Open-Source China Community), Lidong Dai (Co-Founder of White Whale Open-Source), Yu Li (member of Apache Software Foundation), Yipeng Wang (Editor-in-Chief of InfoQ), and Junbo Zhao (Doctoral Advisor of Zhejiang University) to discuss Big Data in the Cloud Era & AI Open-Source.
(The following is a summary of the roundtable discussion.)
(Round Table Dialogue)
Host: Would you like to talk about the original intention of joining open-source?
Hongshu: I used many open-source projects when I first started development. However, before 2001, the development of open-source in China was still in its infancy, and the project was relatively loose. Therefore, I hope to establish a platform to integrate all open-source resources and facilitate developers to retrieve them. It can be said that we have witnessed the vigorous development of open-source in China for more than ten years. The development speed of open-source in China has far exceeded our imagination.
Host: Mr. Li, as a veteran figure in the open-source field and the person in charge of commercial products, what is your original intention when participating in open-source?
Yu Li : Working for ideals is worthy of admiration, but working for livelihood should be accepted and understood. My earliest work at IBM was related to open-source big data components, so I have a deeper understanding of it. On the other hand, I think the open-source community can bring great value to technicians. I hope to communicate and discuss with more people and improve myself.
Host: Is there any difference between the atmosphere of open-source domestically and abroad?
Junbo Zhao: Within the scope of the school, the atmosphere abroad will be better than in China, but this does not mean that domestic developers are not as good as foreign developers. There is a consensus domestically and abroad: open-source is cool. Currently, domestic universities provide students with comprehensive help, including teachers, counselors, academic tutors, scientific research tutors, and employment tutors. On the one hand, this is to ensure the normal operation of the school. On the other hand, if we help students too much, it reduces the subjective initiative of students to some extent, making them accustomed to waiting for passive guidance. The mentality of foreign or new-generation students may be more open.
Host: How can you judge whether an open-source project can enter incubation or succeed in the future?
Lidong Dai: Generally speaking, projects with a specific user base, projects that have been open-sourced for a period, and projects that have maintained normal iteration in the community are more likely to enter the incubator. In addition, Apache appreciates innovation, original projects, and projects that can solve some pain points. Over 95% of authors of open-source projects in China have no experience in operating projects, but once they enter the incubator, they can get a lot of help from mentors. With more than 20 years of successful experience accumulated by Apache, the projects are more likely to succeed.
Host: What is the definition of "success" in open-source projects?
Hongshu: Different projects have different standards and different fields. For example, database software and frontend library software are not comparable in different fields. Even if a frontend project receives a lot of attention, it does not mean it is more successful than a database project. Therefore, I think judging whether a project is successful is related to whether the project meets the sponsor's expectations. If it meets the sponsor's expectations, it can be considered a successful open-source project. In addition, it is not an objective method to judge projects by star number. For example, some tutorial projects on GitHub have gained tens of thousands of star numbers, but this only means it is useful. It does not mean the project is successful.
Therefore, we have introduced the Code Cloud Index to judge a project from five perspectives: code popularity, the number of contributors, the handling of issues, etc. Version 2.0 is in production. It is expected to analyze the projects on Gitte and GitHub at the same time, and the process will be more detailed and implemented in a completely open-source way to help developers observe the development process of the project from more angles.
Host: Flink has a great influence on big data open-source projects. In the process of development, which steps did it get right to get to where it is today?
Li Yu: From the perspective of investors, profitability is the criterion for judging whether an open-source project is successful. From the perspective of users, the criterion is easy to use and can help solve practical problems. From the perspective of developers, the technical challenge is the criterion.
Host: What are the current core problems that AI open-source needs to solve?
Junbo Zhao: Open-source has done too much for AI, and AI has also done too much for open-source. Earlier, writing a project from the bottom often required tens of thousands of lines of code. With the open-source community, a lot of content can be obtained directly from the community and put to use. In addition, in terms of data, I hope to be able to decentralize data from a more open-source perspective, and we will integrate privacy protection mechanisms and federal learning. For example, AI for Science needs a huge amount of money, which cannot be achieved by one person or organization. I hope a more global open-source community will gather data together to form a new business model and then give it back to each contributor. Let the community function successfully. The bottom layer of AI cannot be separated from the efficient flow of data. I hope everyone can contribute data, everyone can share data, and everyone can use data for in-depth analysis.
Host: What should a company pay the most attention to when applying open-source technology? How can the company avoid the occurrence of negative events?
Lidong Dai: When you choose an open-source project, you must consider whether you can solve the problem. At the same time, you need to consider the maturity of the product (such as which stage the current product is in, how many versions have been released, and whether it is an Apache top-level project). Pay attention to its ecology (such as whether the community is active enough). In addition, it is necessary to focus on vulnerabilities to avoid high-risk events (such as Cloud Security Scanner through professional companies).
Host: What kind of model is behind the big data technology maturity assessment? What factors are more important in the evaluation process?
Yipeng Wang: It is difficult to objectively judge the maturity of a project through data statistics alone. Our evaluation model divides the maturity of open-source projects into several parts to evaluate. First, the proportion of code health is about 40%, including close issues, close PR, and PR values. Second, community ecology accounts for about 40%, including community contribution and community developer popularity. Third, the project collaboration influence index contributed by X-lab accounts for about 10%. For example, if a group of people deeply participate in the open-source community and participate in project A as developers, the weight of project A will increase. At the same time, the possibility of multiple experts joining a project and causing extremely high project weight has been directly avoided through algorithms. Fourth, the proportion of the number of star is about 10%.
Host: Open-source today, whether it is the agreement or the overall community development, has been mature worldwide. Is there a direction or way for China to achieve corner overtaking?
Hongshu: The only quantifiable evaluation criterion for the prosperity of the open-source community is the number of well-known projects. Therefore, you first need to increase the number to the same level as other countries. In addition, there are still ten-fold gaps between open-source projects hosted in open-source China and GitHub in terms of data volume, and the gap in open-source influence is far more. Therefore, I am soberly aware that the development of open-source in China is blocked and long. Both quantity and quality require faster development to achieve the overtaking.
Host: Please leave a message for young students.
Yipeng Wang: You must devote yourself to open-source and participate in open-source projects across language barriers and cultural differences. If you can gain happiness in open-source, this is also feedback from open-source projects.
Yu Li: Open-source is a portable way of social practice for young students. There are many experienced and first-class developers in the open-source community that enthusiastically provide free help to open-source newcomers. In addition, the open-source community is a real project and will be applied in production. Students can experience the difference between development in production and development in the laboratory in advance by participating in the open-source community. At the same time, contributing code to open-source projects is an excellent plus for resumes.
Lidong Dai: Statistics in Open-Source China show that many open-source projects last less than two years. I think that apart from love, it is important to insist on it, and we should strive for it as a lifelong career. On the first day of DolphinScheduler, we decided to turn it into an international open-source project, using international standards to nurture team members, such as English issues and emails. If you stick to it for 3-5 years, you will eventually see yourself differently.
Hongshu: The necessity and advantages of open-source no longer need to be demonstrated. We only need to work firmly in this direction and strive to make some contributions to the community.
Junbo Zhao: I hope young students can be more active. If you actively submit PR, even if it is not adopted, you can gain growth and progress from the feedback given by the community. These are all valuable experiences that cannot be obtained in schools.
(Group Photo)
TÜV Rheinland Certifies Carbon Emissions for Alibaba Cloud's Customers
The Open-Source Folks Talk - Episode 4: Alibaba AI's Open-Source Process, and Thinking
1,042 posts | 256 followers
FollowAlibaba Cloud Community - March 9, 2023
Alibaba Cloud Community - March 9, 2023
Alibaba Cloud Community - December 20, 2022
Alibaba Cloud Community - September 5, 2022
Alibaba Cloud Community - June 2, 2022
Alibaba Cloud Community - December 30, 2022
1,042 posts | 256 followers
FollowRealtime Compute for Apache Flink offers a highly integrated platform for real-time data processing, which optimizes the computing of Apache Flink.
Learn MoreAlibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreAlibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn MoreAccelerate AI-driven business and AI model training and inference with Alibaba Cloud GPU technology
Learn MoreMore Posts by Alibaba Cloud Community