使用教程

0.0.201

本文由簡體中文內容自動轉碼而成。阿里雲不保證此自動轉碼的準確性、完整性及時效性。本文内容請以簡體中文版本為準。

本文主要為您介紹如何使用Hive或者HadoopMR訪問Table Store中的表。

資料準備

在Table Store中準備一張資料表pet，name是唯一的一列主鍵，資料樣本請參見下表。

說明

表中空白部分無需寫入，因為Table Store是schema-free的儲存結構，沒有值也無需寫入NULL。

name	owner	species	sex	birth	death

name	owner	species	sex	birth	death
Fluffy	Harold	cat	f	1993-02-04
Claws	Gwen	cat	m	1994-03-17
Buffy	Harold	dog	f	1989-05-13
Fang	Benny	dog	m	1990-08-27
Bowser	Diane	dog	m	1979-08-31	1995-07-29
Chirpy	Gwen	bird	f	1998-09-11
Whistler	Gwen	bird		1997-12-09
Slim	Benny	snake	m	1996-04-29
Puffball	Diane	hamster	f	1999-03-30

Hive訪問樣本

HADOOP_HOME及HADOOP_CLASSPATH可以添加到/etc/profile中，樣本如下：

export HADOOP_HOME=${您的Hadoop安裝目錄}
export HADOOP_CLASSPATH=emr-tablestore-1.4.2.jar:tablestore-4.3.1-jar-with-dependencies.jar:joda-time-2.9.4.jar

執行bin/hive命令進入Hive後，建立外表。樣本如下：

CREATE EXTERNAL TABLE pet
  (name STRING, owner STRING, species STRING, sex STRING, birth STRING, death STRING)
  STORED BY 'com.aliyun.openservices.tablestore.hive.TableStoreStorageHandler'
  WITH SERDEPROPERTIES(
    "tablestore.columns.mapping"="name,owner,species,sex,birth,death")
  TBLPROPERTIES (
    "tablestore.endpoint"="YourEndpoint",
    "tablestore.access_key_id"="YourAccessKeyId",
    "tablestore.access_key_secret"="YourAccessKeySecret",
    "tablestore.table.name"="pet");

具體配置項說明請參見下表。

配置項	說明

配置項

說明

WITH SERDEPROPERTIES

欄位對應配置，包括tablestore.columns.mapping選項配置。

在預設情況下，外表的欄位名即為Table Store上表的列名（主鍵列名或屬性列名）。但有時外表的欄位名和表上列名並不一致（例如處理大小寫或字元集相關的問題），此時需要指定tablestore.columns.mapping。該參數為一個英文逗號分隔的字串，每個分隔之間不能添加空格，每一項都是表中列名，順序與外表欄位保持一致。

說明

Table Store的列名支援空白字元，所以空白也會被認為是表中列名的一部分。

TBLPROPERTIES

表的屬性配置。包括如下選項：

tablestore.endpoint（必選）：訪問Table Store的服務地址，您可以在Table Store控制台上查看執行個體的Endpoint資訊。關於服務地址的更多資訊，請參見服務地址。
tablestore.instance（可選）：Table Store的執行個體名稱。如果不填寫，則為tablestore.endpoint的第一段。關於執行個體的更多資訊，請參見執行個體。
tablestore.access_key_id（必選）：阿里雲帳號或者RAM使用者的AccessKey ID。更多資訊，請參見擷取AccessKey。
當要使用STS服務臨時訪問資源時，請設定此參數為臨時訪問憑證的AccessKey ID。
tablestore.access_key_secret（必選）：阿里雲帳號或者RAM使用者的AccessKey Secret。更多資訊，請參見擷取AccessKey。
當要使用STS服務臨時訪問資源時，請設定此參數為臨時訪問憑證的AccessKey Secret。
tablestore.sts_token（可選）：臨時訪問憑證的安全性權杖。當要使用STS服務臨時訪問資源時，才需要設定此參數。更多資訊，請參見通過RAM Policy為RAM使用者授權。
tablestore.table.name（必選）：Table Store中對應的表名。

查詢表中資料。

執行SELECT * FROM pet;命令查詢表中所有行資料。

返回結果樣本如下：

Bowser  Diane   dog     m       1979-08-31      1995-07-29
Buffy   Harold  dog     f       1989-05-13      NULL
Chirpy  Gwen    bird    f       1998-09-11      NULL
Claws   Gwen    cat     m       1994-03-17      NULL
Fang    Benny   dog     m       1990-08-27      NULL
Fluffy  Harold  cat     f       1993-02-04      NULL
Puffball        Diane   hamster f       1999-03-30      NULL
Slim    Benny   snake   m       1996-04-29      NULL
Whistler        Gwen    bird    NULL    1997-12-09      NULL
Time taken: 5.045 seconds, Fetched 9 row(s)

執行SELECT * FROM pet WHERE birth > "1995-01-01";命令查詢表中birth列值大於1995-01-01的行資料。

返回結果樣本如下：

Chirpy  Gwen    bird    f       1998-09-11      NULL
Puffball        Diane   hamster f       1999-03-30      NULL
Slim    Benny   snake   m       1996-04-29      NULL
Whistler        Gwen    bird    NULL    1997-12-09      NULL
Time taken: 1.41 seconds, Fetched 4 row(s)

HadoopMR訪問樣本

以下樣本介紹如何使用HadoopMR程式統計資料表pet的行數。

構建Mappers和Reducers

public class RowCounter {
public static class RowCounterMapper
extends Mapper<PrimaryKeyWritable, RowWritable, Text, LongWritable> {
    private final static Text agg = new Text("TOTAL");
    private final static LongWritable one = new LongWritable(1);

    @Override
    public void map(
        PrimaryKeyWritable key, RowWritable value, Context context)
        throws IOException, InterruptedException {
        context.write(agg, one);
    }
}

public static class IntSumReducer
extends Reducer<Text,LongWritable,Text,LongWritable> {

    @Override
    public void reduce(
        Text key, Iterable<LongWritable> values, Context context)
        throws IOException, InterruptedException {
        long sum = 0;
        for (LongWritable val : values) {
            sum += val.get();
        }
        context.write(key, new LongWritable(sum));
    }
}
}

資料來源每從Table Store上讀出一行，都會調用一次mapper的map()。PrimaryKeyWritable和RowWritable兩個參數分別對應這行的主鍵以及該行的內容。您可以通過調用PrimaryKeyWritable.getPrimaryKey()和RowWritable.getRow()取得Table StoreJava SDK定義的主鍵對象及行對象。

配置Table Store作為mapper的資料來源。

private static RangeRowQueryCriteria fetchCriteria() {
    RangeRowQueryCriteria res = new RangeRowQueryCriteria("YourTableName");
    res.setMaxVersions(1);
    List<PrimaryKeyColumn> lower = new ArrayList<PrimaryKeyColumn>();
    List<PrimaryKeyColumn> upper = new ArrayList<PrimaryKeyColumn>();
    lower.add(new PrimaryKeyColumn("YourPkeyName", PrimaryKeyValue.INF_MIN));
    upper.add(new PrimaryKeyColumn("YourPkeyName", PrimaryKeyValue.INF_MAX));
    res.setInclusiveStartPrimaryKey(new PrimaryKey(lower));
    res.setExclusiveEndPrimaryKey(new PrimaryKey(upper));
    return res;
}

public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "row count");
    job.addFileToClassPath(new Path("hadoop-connector.jar"));
    job.setJarByClass(RowCounter.class);
    job.setMapperClass(RowCounterMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(LongWritable.class);
    job.setInputFormatClass(TableStoreInputFormat.class);
    TableStoreInputFormat.setEndpoint(job, "https://YourInstance.Region.ots.aliyuncs.com/");
    TableStoreInputFormat.setCredential(job, "YourAccessKeyId", "YourAccessKeySecret");
    TableStoreInputFormat.addCriteria(job, fetchCriteria());
    FileOutputFormat.setOutputPath(job, new Path("output"));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
}

樣本中使用job.setInputFormatClass(TableStoreInputFormat.class)將Table Store設定為資料來源，除此之外，還需要：

把hadoop-connector.jar部署到叢集上並添加到classpath中。路徑為addFileToClassPath()指定hadoop-connector.jar的本地路徑。代碼中假定hadoop-connector.jar在當前路徑。
訪問Table Store需要指定入口和身份。通過TableStoreInputFormat.setEndpoint()和TableStoreInputFormat.setCredential()設定訪問Table Store需要指定的Endpoint和AccessKey資訊。
指定一張表用來計數。
說明
- 每調用一次addCriteria()可以在資料來源裡添加一個Java SDK定義的RangeRowQueryCriteria對象。可以多次調用addCriteria()。RangeRowQueryCriteria對象與Table StoreJava SDK GetRange介面所用的RangeRowQueryCriteria對象具有相同的限制條件。
- 使用RangeRowQueryCriteria的setFilter()和addColumnsToGet()可以在Table Store的伺服器端過濾掉不必要的行和列，減少訪問資料的大小，降低成本，提高效能。
- 通過添加對應多張表的多個RangeRowQueryCriteria，可以實現多表的union。
- 通過添加同一張表的多個RangeRowQueryCriteria，可以做到更均勻的切分。TableStore-Hadoop Connector會根據一些策略將使用者傳入的範圍切細。

程式運行樣本

設定HADOOP_CLASSPATH。

HADOOP_CLASSPATH=hadoop-connector.jar bin/hadoop jar row-counter.jar

執行find output -type f命令尋找output目錄下的所有檔案。
返回結果樣本如下：
```
output/_SUCCESS
output/part-r-00000
output/._SUCCESS.crc
output/.part-r-00000.crc
```
執行cat output/part-r-00000命令統計運行結果中的行數。
```
TOTAL   9
```

類型轉換說明

Table Store支援的資料類型與Hive或者Spark支援的資料類型不完全相同。

下表列出了從Table Store的資料類型（行）轉換到Hive或Spark資料類型（列）的支援情況。

類型轉換	TINYINT	SMALLINT	INT	BIGINT	FLOAT	DOUBLE	BOOLEAN	STRING	BINARY

類型轉換	TINYINT	SMALLINT	INT	BIGINT	FLOAT	DOUBLE	BOOLEAN	STRING	BINARY
INTEGER	支援，損失精度	支援，損失精度	支援，損失精度	支援	支援，損失精度	支援，損失精度	不支援	不支援	不支援
DOUBLE	支援，損失精度	支援，損失精度	支援，損失精度	支援，損失精度	支援，損失精度	支援	不支援	不支援	不支援
BOOLEAN	不支援	不支援	不支援	不支援	不支援	不支援	支援	不支援	不支援
STRING	不支援	不支援	不支援	不支援	不支援	不支援	不支援	支援	不支援
BINARY	不支援	不支援	不支援	不支援	不支援	不支援	不支援	不支援	支援

反饋

上一篇：環境準備下一篇：Function Compute

本頁導讀（1, M）

資料準備

Hive訪問樣本

HadoopMR訪問樣本

程式運行樣本

類型轉換說明

聯絡我們

立即和Alibaba Cloud在線服務人員進行交談，獲取您想了解的產品信息以及最新折扣。

找到我們

我們支持的支付方式

關於

關於阿里雲

定價模型

產品

客戶

合作夥伴

初創公司

雲棲大會

阿里雲峰會

促銷活動

免費試用

Simple Application Server

探索

China Gateway

支援申請ICP牌照

開始使用

網誌

市場

培訓與認證

支援

聯絡銷售團隊

提交服務單

售後支援

安全檢舉

提交建議

定價計數機

資源

文檔中心

Alibaba Cloud MVP

安全與合規

新聞發布室

WHOIS

狀態

產品和解決方案

Elastic Compute Service

CDN

Anti-DDoS

Object Storage Service

電子商務

網頁託管

安全

熱門內容

日本站

ECS文檔

如何獲得域名

軟件基礎設施

學習路徑

新用戶

資料準備

Hive訪問樣本

HadoopMR訪問樣本

程式運行樣本

類型轉換說明

銷售支援

技術支援

聯絡我們 & 報告濫用行為

關於 Alibaba Cloud

環球網絡

快速入門

環球辦公室

2024年巴黎奧運會 New

羅蘭加洛斯球場 — 昔日榮光 New

協和廣場 —「打破」障礙 New

馬恩河畔韋爾航海體育場 — 可持續運動 New

國際廣播中心 — 吸引數十億人的影像、聲音和數據 New

客戶成功案例 New

阿里雲信任中心

合規計劃

雲端合規資源

合規常見問題

最新產品及功能 New

Cloud Forward

新聞發佈室

阿里雲電子期刊 New

Alibaba Cloud 分析師研究

公告

阿里雲出海業務 New

“橙”雲出海服務聯盟

China Gateway Hot

資訊合規

China Gateway - MLPS 2.0 合規 New

China Gateway - 網絡

China Gateway - 全球加速應用程式 New

China Gateway - 安全

China Gateway - 數據安全 New

ICP 支援 Hot

China Gateway - 全域數據中台 New

China Gateway - 組織數據中端 New

China Gateway - 業務中端 New

China Gateway - 智慧客服解決方案 New

China Gateway - 網上教育

China Gateway - 網域註冊

工作在阿里雲

資深專業人士

學生和畢業生

免费试用

定價

優惠中心

減價

付出更少金錢，進行更多部署

FinOps

Elastic Compute Service (ECS)

Simple Application Server (SAS)

Elastic GPU Service

Elastic Desktop Service (EDS)

Object Storage Service (OSS)

Cloud Enterprise Network (CEN)

Web Application Firewall (WAF)

Domain Names

Container Compute Service (ACS)

Secure Access Service Edge (SASE)

Intelligent Media Services(IMS)

Edge Security Acceleration (ESA)(Original DCDN)

Intelligent Media Management

DingTalk Enterprise

YiDA

Alibaba Cloud Model Studio

Apsara Prime - 更輕鬆選擇雲端產品

阿裡雲ECS-滿足您所有的雲託管需求

1TB CDN—立即獲享免費 1 TB 輸出流量方案

安全性—面臨攻擊？ 獲享免費安全支援

Short Message Service - 免費測試現已登場

Elastic Compute Service (ECS) Hot

CloudBox

Compute Nest

Dedicated Host Hot

ECS Bare Metal Instance

Elastic GPU Service Featured

Simple Application Server (SAS) Hot

Auto Scaling

Cloud Phone Beta

Elastic Desktop Service (EDS) Featured

Batch Compute

Elastic High Performance Computing (E-HPC)

Super Computing Cluster (SCC)

Function Compute (FC)

安全性—面臨攻擊？獲享免費安全支援