Data ingestion from OSS with DataWorks, a tool for data ingestion, is user friendly and easy, can be done end to end using web-based approach, which enabled customers especially business users to do it quickly and simply, allowing them to focus their time and effort on more important tasks - running computation of big data.
In this article, we will show you how to perform data ingestion from Alibaba Cloud's Object Storage Service (OSS) with DataWorks.
After you have prepared the OSS bucket, you can follow the following procedure to integrate data from OSS.
For more detailed information about how to prepare OSS bucket for data ingestion and the configuration in DataWorks Data Integration, please go to MaxCompute Data Ingestion from OSS.
In this blog series, we will walk you through the entire cycle of Big Data analytics. Now that we are familiar with the basics and with cluster creation, it is time to understand the data which is acquired from various sources and the most suitable data format to ingest it into Big Data environment.
In this article, we will take a closer look into the concepts and usage of HDFS and Sqoop for data ingestion.
We will take a dive deep into HDFS, the storage part of Hadoop which is one of the world’s most reliable storage system. The distributed storage and replication of data is the major feature of HDFS which makes it a fault-tolerant storage system. The features which make HDFS suitable for large datasets to run on commodity hardware are Fault tolerance, High availability, reliability and scalability.
Use Data integration overview function of DataWorks to create data synchronization tasks and import and export MaxCompute data.
Note:
This topic describes how to create a Log Service source table in Realtime Compute. It also describes the attribute fields, WITH parameters, and field type mapping involved in the table creation process.
Log Service is an all-in-one real-time data logging service that Alibaba Group has developed and tested in many big data scenarios. Based on Log Service, you can quickly finish tasks such as data ingestion, consumption, delivery, query, and analysis without any extra development work. This can help you improve O&M and operational efficiency, and build up the capability to process large amounts of logs in the data technology era.
Alibaba Cloud Elasticsearch is a cloud-based Service that offers built-in integrations such as Kibana, commercial features, and Alibaba Cloud VPC, Cloud Monitor, and Resource Access Management. It can securely ingest data from any source and search, analyze, and visualize it in real time. With Pay-As-You-Go billing, Alibaba Cloud Elasticsearch costs 30% less than self-built solutions and saves you the hassle of maintaining and scaling your platform.
DataWorks is a Big Data platform product launched by Alibaba Cloud. It provides one-stop Big Data development, data permission management, offline job scheduling, and other features. It supports data integration, MaxCompute SQL, MaxCompute MR, machine learning, and shell tasks.
2,599 posts | 762 followers
FollowAlibaba Cloud Indonesia - August 28, 2020
Data Geek - March 12, 2021
Alibaba Clouder - December 29, 2020
Alibaba Cloud Community - November 30, 2021
Alibaba Cloud Project Hub - March 19, 2024
Alibaba Cloud MaxCompute - March 25, 2021
2,599 posts | 762 followers
FollowData Integration is an all-in-one data synchronization platform. The platform supports online real-time and offline data exchange between all data sources, networks, and locations.
Learn MoreMore Posts by Alibaba Clouder