Data science is a discipline that makes data useful. It contains three important concepts: statistics, machine learning, data mining/analysis.
If you look back at the early history of the term data science, you will find two themes are closely connected:
As a result, data science emerged. Earlier, people thought of data scientists as statisticians who knew how to code. Now it seems that this statement is not accurate. First, let us return to data science itself.
In 2003, the "Data Science Journal" once stated: "The so-called'data science' refers to any data-related content." I agree with this, and now everything cannot be separated from the data.
Since then, definitions of data science have emerged in endlessly, such as Conway's Venn diagram and the classic views of Mason and Wiggins.
The definition of data science on Wikipedia is closer to what I teach to students:
Data science is just a concept that combines statistics, data analysis, machine learning and related methods, and aims to use data to "understand and analyze" actual phenomena.
Simply put, Data Science is a discipline that makes data useful.
If you don’t know what decision you have to make, the best way is to find inspiration. This is the so-called data mining, data analysis, descriptive analysis, exploratory data analysis or knowledge discovery.
Unless you know how to make your decision, start by looking for inspiration. The method is very simple, you just need to think of the data set as a pile of negatives you find in a dark room. Data mining is to make the device publish all the pictures as quickly as possible so that you can see if there is anything inspiring on these pictures. As with photos, don’t take what you see too seriously. You didn't take these photos, so you don't know much about things outside the screen. The golden rule of data mining is: only make conclusions about what you can see, not what you can't see, because you need statistics and more professional knowledge.
In addition, you should try to do your best. The expertise of data mining is judged by checking the speed of the data. Don't be obsessed with things that seem interesting.
Inspiration is easy to obtain, but rigor is difficult to achieve. If you want to master data, you need professional courses. As an undergraduate and graduate student majoring in statistics, I think statistical inference (statistics for short) is the most difficult and most philosophical among these three fields. It takes a lot of time to do it well.
If you plan to make high-quality and risk-controllable decisions, since decision-making does not only rely on the data you get, you need to add statistical skills to your analysis team at this time.
When the situation is uncertain, perhaps statistics can change your mind.
In essence, machine learning uses examples rather than instructions to implement operations. I have also written some articles about machine learning, including how machine learning is different from artificial intelligence, how to get started with machine learning, the experience and lessons of using machine learning in enterprises, and introducing children to supervised learning.
Over the last decade, data demand has grown substantially, with multiple data processing systems surfacing for processing, analysis, and development. For instance, IDC estimated in 2016 that by 2020, the digital universe will have grown to 70 zettabytes. The amount of data to be produced in 2021 is expected to surpass this number on a massive scale.
Big data, data warehousing, Data Analytics, and the smallest demands for information today are dealt with through data and its manipulation. Part 1 of this six-part article series focuses on data analytics solutions by Alibaba Cloud.
Data powers intelligent business. As the market is maturing and enterprises are adopting various data analytics products and solutions, coherent data integration becomes a new challenge. Alibaba Cloud’s Data Analytics and AI solutions help you build a unified platform with full data analytic capabilities to streamline your data pipeline and create a consistent user experience throughout the complete data lifecycle. Alibaba Cloud provides industry solutions and applications to embed these data analytic capabilities into your business processes and professional Big Data Consulting Services to help lower total cost of ownership (TCO) and make your data analytics journey easier.
2,599 posts | 762 followers
FollowMoises Alicante - April 16, 2020
Alibaba Clouder - February 24, 2017
Alibaba Cloud MaxCompute - June 22, 2020
Alibaba Cloud Community - December 12, 2023
Alibaba Clouder - April 22, 2021
Alibaba Cloud Community - December 31, 2021
2,599 posts | 762 followers
FollowAlibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreAlibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn MoreApsaraDB for HBase is a NoSQL database engine that is highly optimized and 100% compatible with the community edition of HBase.
Learn MoreAlibaba Cloud equips financial services providers with professional solutions with high scalability and high availability features.
Learn MoreMore Posts by Alibaba Clouder