Two Sample T Test

0.0.201

The Two Sample T Test component is used to check whether the population means from two samples are significantly different from each other based on the principles of statistics. This topic describes how to configure parameters for the Two Sample T Test component provided by Machine Learning Designer (formerly known as Machine Learning Studio). This topic also provides an example on how to use the Two Sample T Test component.

Configure the component

You can use one of the following methods to configure the Two Sample T Test component.

Method 1: Configure the component on the pipeline page

You can configure the parameters of the Two Sample T Test component on the pipeline page of Machine Learning Designer of Machine Learning Platform for AI (PAI). Machine Learning Designer is formerly known as Machine Learning Studio. The following table describes the parameters.

Tab	Parameter	Description

Tab	Parameter	Description
Fields Setting	Sample 1 Column	The column that contains Sample 1.
Fields Setting	Sample 2 Column	The column that contains Sample 2.
Parameters Setting	T Test Type	The type of the T test that you want to perform. Valid values: Independent T Test: Check whether the population means from two independent samples are significantly different from each other. The two samples tested must be independent of each other and generally have a normal distribution. Paired T Test: Check whether the population means from two paired samples are significantly different from each other.
	Alternative Hypothesis Type	The type of alternative hypothesis. Valid values: two.sided: Check whether a population mean is either greater than or less than a hypothesized value. less: Check whether a population mean is less than a hypothesized value. greater: Check whether a population mean is greater than a hypothesized value.
	Confidence Level	The confidence level of the test result. Valid values: 0.8, 0.9, 0.95, 0.99, 0.995, and 0.999.
	Hypothesized Mean	The hypothesized mean. Default value: 0.
	Variances of Two Populations Are Equal	Specifies whether the variances of two populations are equal. Valid values: true and false.
	Cores	The number of cores. The value must be a positive integer. This parameter must be used with the Memory Size Per Core parameter. Valid values: 1 to 9999.
	Memory Size Per Core	The memory size of each core. Unit: MB. The value must be a positive integer. Valid values: 1024 to 65536.

Method 2: Use PAI commands

Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.

pai -name t_test 
    -project algo_public 
    -DxTableName=pai_t_test_all_type
    -DxColName=col1_double
    -DxTablePartitions=ds=2010/dt=1
    -DyTableName=pai_t_test_all_type
    -DyColName=col1_double
    -DyTablePartitions=ds=2010/dt=1 
    -DoutputTableName=pai_t_test_out
    -Dalternative=less
    -Dmu=47
    -DconfidenceLevel=0.95
    -Dpaired=false
    -DvarEqual=true

Parameter	Required	Description	Default value

Parameter	Required	Description	Default value
xTableName	Yes	The name of Input Table x.	N/A
xTablePartitions	No	The one or more partitions in Input Table x that are used in the T test. The following formats are supported: Partition_name=value name1=value1/name2=value2: multi-level partitions Note If you specify multiple partitions, separate them with commas (,).	All partitions
xColName	Yes	The column in Input Table x that is used in the T test. The value must be of the DOUBLE or INT type.	N/A
yTableName	Yes	The name of Input Table y.	N/A
yTablePartitions	No	The one or more partitions in Input Table y that are used in the T test. The following formats are supported: Partition_name=value name1=value1/name2=value2: multi-level partitions Note If you specify multiple partitions, separate them with commas (,).	All partitions
yColName	Yes	The column in Input Table y that is used in the T test. The value must be of the DOUBLE or INT type.	N/A
paired	No	true: paired T test false: independent T test	false
alternative	No	The type of alternative hypothesis. Valid values: two.sided, less, and greater.	two.sided
mu	No	The hypothesized mean. The value must be of the DOUBLE type.	0
varEqual	No	Specifies whether the variances of two populations are equal. Valid values: true and false.	false
confidenceLevel	No	The confidence level of the test result. Valid values: 0.8, 0.9, 0.95, 0.99, 0.995, and 0.999.	0.95
coreNum	No	The number of cores. The value must be a positive integer. This parameter must be used with the memSizePerCore parameter. Valid values: 1 to 9999.	Determined by the system
memSizePerCore	No	The memory size of each core. Unit: MB. The value must be a positive integer. Valid values: 1024 to 65536.	Determined by the system
lifecycle	No	The lifecycle of the output table.	N/A

If the input tables are regular tables but not partitioned tables, we recommend that you do not set the coreNum and memSizePerCore parameters. Instead, use the default values determined by the system. If you do not have sufficient computing resources, use the following code to calculate the amount of computing resources needed:

def CalcCoreNumAndMem(row,centerCount,kOneCoreDataSize=1024):
    """Calculate the number of cores and memory size of each core.            
       Args:
           row: the number of rows in an input table. 
           centerCount: the number of columns in an input table. 
           kOneCoreDataSize: the amount of data that can be computed by each core. Unit: MB. The value must be a positive integer. Default value: 1024. 
       Return:
           coreNum,memSizePerCore                 
       Example:
           coreNum,memSizePerCore = CalcCoreNumAndMem(1000,99,100,kOneCoreDataSize=2048)

    """
    kMBytes = 1024.0 * 1024.0
    # The number of cores involved in computing. 
    coreNum = max(1, int(row * 2 * 8 / kMBytes / kOneCoreDataSize))
    # Memory size per core = Data amount. 
    memSizePerCore = max(1024,int(kOneCoreDataSize * 2))
    return coreNum,memSizePerCore

Example

Test data

create table pai_test_input as
select * from
(
  select 1 as f0,2 as f1
  union all
  select 1 as f0,3 as f1
  union all
  select 1 as f0,4 as f1
  union all
  select 0 as f0,3 as f1
  union all
  select 0 as f0,4 as f1
)tmp;

PAI command

pai -name t_test 
    -project algo_public 
    -DxTableName=pai_test_input
    -DxColName=f0
    -DyTableName=pai_test_input
    -DyColName=f1
    -DyTablePartitions=ds=2010/dt=1 
    -DoutputTableName=pai_t_test_out
    -Dalternative=less
    -Dmu=47
    -DconfidenceLevel=0.95
    -Dpaired=false
    -DvarEqual=true

Output

The output table is in the JSON format and contains only one row and one column.

{
    "AlternativeHypthesis": "difference in means not equals to 0",
    "ConfidenceInterval": "(-2.5465, -0.4535)",
    "ConfidenceLevel": 0.95,
    "alpha": 0.05000000000000004,
    "df": 19,
    "mean of the differences": -1.5,
    "p": 0.008000000000000007,
    "t": -3
}

Feedback

Previous: Correlation Coefficient MatrixNext: One Sample T Test

On this page （1, O）

Configure the component

Method 1: Configure the component on the pipeline page

Method 2: Use PAI commands

Example

Chat now with Alibaba Cloud Customer Service to assist you in finding the right products and services to meet your needs.

Configure the component

Method 1: Configure the component on the pipeline page

Method 2: Use PAI commands

Example

Sales Support

Technical Support

Connect & Report Abuse

About Alibaba Cloud

Our Global Network

Quick Start

Global Offices

Olympic Games Paris 2024 New

Stade Roland Garros – Glitz from the Past New

Place de la Concorde – “Breaking” the Barriers New

Vaires-sur-Marne Nautical Stadium – Sports with Sustainability New

International Broadcast Center – Images, Sounds, and Data that Captivate Billions New

Customer Success Stories New

Trust Center

Security & Compliance Center

Cloud Compliance Resources

Security Compliance FAQs

Product & Feature Update New

Cloud Forward

Press Room

Alibaba Cloud e-Magazine New

Alibaba Cloud in Analyst Research

Notice

Go Global Service New

Go Global Alliance with Alibaba Cloud

Asia Accelerator Hot

Information Compliance

China Gateway - MLPS 2.0 Compliance New

China Gateway - Networking

China Gateway - Global Application Acceleration New

China Gateway - Security

China Gateway - Data Security New

ICP Support Hot

China Gateway - Omnichannel Data Mid-End New

China Gateway - Organizational Data Mid-End New

China Gateway - Business Mid-End New

China Gateway - AI Service for Conversational Chatbots New

China Gateway - Online Education

China Gateway - Domain Registration

Work at Alibaba Cloud

Experienced Professionals

Students and Graduates

Free Trial

Pricing

Promo Center

Price Reduction

Pay Less and Deploy More

FinOps

Elastic Compute Service (ECS)

Simple Application Server (SAS)

Elastic GPU Service

Elastic Desktop Service (EDS)

Object Storage Service (OSS)

Cloud Enterprise Network (CEN)

Web Application Firewall (WAF)

Domain Names

Container Compute Service (ACS)

Secure Access Service Edge (SASE)

Intelligent Media Services(IMS)

Edge Security Acceleration (ESA)(Original DCDN)

Intelligent Media Management

DingTalk Enterprise

YiDA

Alibaba Cloud Model Studio

Apsara Prime - For Easy Cloud Product Selection

Alibaba Cloud ECS - Cater All Your Cloud Hosting Needs

1TB CDN—Get Free 1 TB Outbound Traffic Plan Now

Security—Under Attack? Get Free Security Support

Short Message Service - Free Testing is Available

Elastic Compute Service (ECS) Hot

CloudBox

Compute Nest

Dedicated Host Hot

ECS Bare Metal Instance

Elastic GPU Service Featured

Simple Application Server (SAS) Hot

Auto Scaling

Cloud Phone Beta

Elastic Desktop Service (EDS) Featured

Batch Compute

Elastic High Performance Computing (E-HPC)

Super Computing Cluster (SCC)

Function Compute (FC)