All Products
Search
Document Center

PolarDB:Pearson Correlation Coefficient

Last Updated:Mar 22, 2024

This topic describes Pearson Correlation Coefficient that measures the linear correlation of two features. The larger the absolute value, the stronger the correlation.

Scenarios

Variables to which Pearson Correlation Coefficient can be applied must meet the following requirements:

  • The standard deviation of neither variable is 0.

  • The variables are in a linear relationship and are continuous.

  • The variables are in a bivariate normal distribution, or in a unimodal distribution that resembles a normal distribution.

Pearson Correlation Coefficient is commonly used to determine the linear relationship of two features in a machine learning model. If two features are highly correlated, they may be interchangeable. In this case, you can discard one of them to ensure effectiveness of the model.

Syntax

CREATE FEATURE feature_name WITH ( feature_class = '', x_cols = '', parameters=()) AS (SELECT select_expr [, select_expr] ... FROM table_reference)

Parameter description:

Parameter

Description

Example

feature_name

The name of the feature.

pearson_001

feature_class

The type of the feature. Set the value to pearson.

pearson

x_cols

Custom parameters for creating the feature. Each value must be a floating point or an integer. Separate multiple variables with commas (,).

dx1,dx2

parameters

Custom parameters for creating the feature. The following parameters are supported:

  • null_strategy: specifies how to replace NULL values. The following parameters are supported:

    • mean: replaces NULL values with the average value.

    • median: replaces NULL values with the median value.

  • categorical_feature: the categorical features. Separate multiple features with commas (,). Use this parameter to exclude non-integer and non-floating-point columns in x_cols.

categorical_feature='dx3'

select_expr

The name of the column used to create the feature.

dx4

table_reference

The name of the table containing the column used to create the feature.

airlines_test_1000

Example

/*polar4ai*/CREATE FEATURE pearson_001 WITH ( feature_class = 'pearson',x_cols='Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length',parameters=(null_strategy='mean',categorical_feature='Airline,Flight,AirportFrom,AirportTo,DayOfWeek')) AS (SELECT * FROM airlines_test_1000);