Responsible AI: error analysis - Platform For AI - Alibaba Cloud Documentation Center

Responsible AI is crucial for AI model developers and enterprise customers. Responsible AI plays an important role in model development, training, fine-tuning, evaluation, and deployment to ensure that AI models are safe, stable, fair, and socially ethical. Data Science Workshop (DSW) of Alibaba Cloud Platform for AI (PAI) allows you to integrate Responsible AI tools to perform fairness analysis, error analysis, and interpretability analysis on generated AI models.

How it works

Error analysis is an important step in understanding and improving model performance. Error analysis systematically identifies, analyzes, and fixes errors in AI model predictions to improve the accuracy and fairness of the model. The following section describes the principles of error analysis:

Error identification: Identify inaccurate predictions generated by the model. The system compares the predicted results of the model with the true values to identify inconsistent cases. Errors can be categorized into different types, such as false positives and false negatives.
Error categorization: Categorize errors based on their attributes. Error categorization helps identify the root causes of the errors, such as data imbalance, insufficient features, or model bias. This process requires domain knowledge and human judgment.
Cause analysis: Analyze the causes behind each error category. Cause analysis has a significant impact on model optimization and may involve analyzing data quality and fixing issues related to model design, feature engineering, or data representation.
Improvement operations: Based on the results of the error analysis, you can perform operations to resolve the issues of the model. The operations may include cleaning data, rebalancing the dataset, modifying the model architecture, introducing new features, or using different algorithms.
Iteration and evaluation: Error analysis is a continuous iterative process. Each time you modify the model, you need to perform error analysis to evaluate the impact of the changes, monitor improvements in model performance, and identify new issues.
Documentation and reporting: To ensure transparency and explainability, we recommend that you document the error analysis process, including the identified issues and corrective actions performed. This helps team members understand the limitations of the model and provides insights to improve other project stages.

This topic describes how to use the responsible-ai-toolbox to perform error analysis on a model in DSW. In this topic, a model that predicts whether the annual income exceeds 50K is evaluated.

Prepare the environment and resources

DSW instance: If you do not have a DSW instance, create a DSW instance. For more information, see Create a DSW instance. We recommend that you use the following configurations:
- Instance type: ecs.gn6v-c8g1.2xlarge
- Image: Python 3.9 or later. In this topic, the official image used is tensorflow-pytorch-develop:2.14-pytorch2.1-gpu-py311-cu118-ubuntu22.04.
- Model: responsible-ai-toolbox supports regression and binary classification models based on Sklearn, PyTorch, and TensorFlow frameworks.
Training dataset: We recommend that you use your dataset. If you want to use a sample dataset, see the "Step 3. Prepare the datasets" section of this topic.
Algorithm model: We recommend that you use your algorithm model. If you want to use a sample algorithm model, see the "Step 5. Train the model" section of this topic.

Step 1: Go to the DSW Gallery

Log on to the PAI console.
In the upper-left corner, select a region based on your business requirements.
In the left-side navigation pane, choose Big Data and AI Experience > DSW Gallery. Search for Responsible AI error analysis and click Open in DSW.
Select an AI workspace and a DSW instance and click OK. The system opens the Responsible AI error analysis notebook.

Step 2: Import the dependency package

Install the responsible-ai-toolbox dependency package (raiwidgets) for subsequent evaluation.

!pip install raiwidgets==0.34.1

Import the dependency packages of Responsible AI and Sklearn for subsequent training.

# Import Response AI-related dependency package.

import zipfile
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer

import pandas as pd
from lightgbm import LGBMClassifier
from raiutils.dataset import fetch_dataset
import sklearn
from packaging import version
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer

Step 3: Prepare the dataset

Download and decompress the required datasets. The package includes the training data named adult-train.csv and test data named adult-test.csv.

# Specify the name of the dataset file.
outdirname = 'responsibleai.12.28.21'
zipfilename = outdirname + '.zip'

# Download the dataset and decompress the file.
fetch_dataset('https://publictestdatasets.blob.core.windows.net/data/' + zipfilename, zipfilename)
with zipfile.ZipFile(zipfilename, 'r') as unzip:
    unzip.extractall('.')

Step 4: Preprocess data

Load the training data named adult-train.csv and the test data named adult-test.csv.
Split the training data and test data into feature variables and target variables. Target variable refers to the actual result predicted by the model, and feature variables refer to variables other than the target variable in each instance data. In this example, the target variable is income and the feature variables are workclass, education, and marital-status.
Convert the training data to the numpy format for training.

# Load training data and test data.
train_data = pd.read_csv('adult-train.csv', skipinitialspace=True)
test_data = pd.read_csv('adult-test.csv', skipinitialspace=True)


# Specify the columns in which the feature variables and the target variable reside.
target_feature = 'income'
categorical_features = ['workclass', 'education', 'marital-status',
                        'occupation', 'relationship', 'race', 'gender', 'native-country']

# Specify a function to split the feature variables and the target variable.
def split_label(dataset, target_feature):
    X = dataset.drop([target_feature], axis=1)
    y = dataset[[target_feature]]
    return X, y


# Split the feature variables and the target variable.
X_train_original, y_train = split_label(train_data, target_feature)
X_test_original, y_test = split_label(test_data, target_feature)


# Convert to the numpy format.
y_train = y_train[target_feature].to_numpy()
y_test = y_test[target_feature].to_numpy()

# Specify the test sample.
test_data_sample = test_data.sample(n=500, random_state=5)

You can also load your dataset. The following section provides sample commands for datasets in the CSV format.

import pandas as pd

# Load your dataset in the CSV format.
# Use pandas to read CSV files.
try:
    data = pd.read_csv(filename)
except:
    pass

Step 5: Train the model

In this example, Sklearn is used to define a data training pipeline and train a binary classification model.

# Define the ohe_params parameter based on different versions of Sklearn
if version.parse(sklearn.__version__) < version.parse('1.2'):
    ohe_params = {"sparse": False}
else:
    ohe_params = {"sparse_output": False}

# Define classification pipeline for feature conversion. The X parameter represents training data.    
def create_classification_pipeline(X):
    pipe_cfg = {
        'num_cols': X.dtypes[X.dtypes == 'int64'].index.values.tolist(),
        'cat_cols': X.dtypes[X.dtypes == 'object'].index.values.tolist(),
    }
    num_pipe = Pipeline([
        ('num_imputer', SimpleImputer(strategy='median')),
        ('num_scaler', StandardScaler())
    ])
    cat_pipe = Pipeline([
        ('cat_imputer', SimpleImputer(strategy='constant', fill_value='?')),
        ('cat_encoder', OneHotEncoder(handle_unknown='ignore', **ohe_params))
    ])
    feat_pipe = ColumnTransformer([
        ('num_pipe', num_pipe, pipe_cfg['num_cols']),
        ('cat_pipe', cat_pipe, pipe_cfg['cat_cols'])
    ])

    pipeline = Pipeline(steps=[('preprocessor', feat_pipe),
                               ('model', LGBMClassifier(random_state=0))])

    return pipeline
    
# Create a classification model training pipeline.
pipeline = create_classification_pipeline(X_train_original)

# Train the model.
model = pipeline.fit(X_train_original, y_train)

Step 6: Add the Responsible AI component

Run the following script to add the error analysis component of Responsible AI and perform computing by using rai_insights.

# Import RAI dashboard components.
from raiwidgets import ResponsibleAIDashboard
from responsibleai import RAIInsights

# Define the RAIInsights object.
from responsibleai.feature_metadata import FeatureMetadata
feature_metadata = FeatureMetadata(categorical_features=categorical_features, dropped_features=[])
rai_insights = RAIInsights(model, train_data, test_data_sample, target_feature, 'classification',
                           feature_metadata=feature_metadata)

# Add an error analysis component.
rai_insights.error_analysis.add()

# Perform computing by using RAI.
rai_insights.compute()

Step 7: Create a Responsible AI dashboard

Create data groups based on various filtering conditions for error analysis. Examples:

Age is less than 65 and hours-per-week is greater than 40 hours.
Marital status is Never-married or Divorced.
Data group index is less than 20.
Predicted Y is greater than 50K.
True Y is greater than 50K.

Introduce ResponsibleAIDashboard to analyze the model by using responsible-ai-toolbox.

from raiutils.cohort import Cohort, CohortFilter, CohortFilterMethods
import os
from urllib.parse import urlparse

# Age is less than 65 and hours-per-week is greater than 40 hours.
cohort_filter_age = CohortFilter(
    method=CohortFilterMethods.METHOD_LESS,
    arg=[65],
    column='age')
cohort_filter_hours_per_week = CohortFilter(
    method=CohortFilterMethods.METHOD_GREATER,
    arg=[40],
    column='hours-per-week')

user_cohort_age_and_hours_per_week = Cohort(name='Cohort Age and Hours-Per-Week')
user_cohort_age_and_hours_per_week.add_cohort_filter(cohort_filter_age)
user_cohort_age_and_hours_per_week.add_cohort_filter(cohort_filter_hours_per_week)

# Marital status is Never-married or Divorced.
cohort_filter_marital_status = CohortFilter(
    method=CohortFilterMethods.METHOD_INCLUDES,
    arg=["Never-married", "Divorced"],
    column='marital-status')

user_cohort_marital_status = Cohort(name='Cohort Marital-Status')
user_cohort_marital_status.add_cohort_filter(cohort_filter_marital_status)

# Data group index is less than 20.
cohort_filter_index = CohortFilter(
    method=CohortFilterMethods.METHOD_LESS,
    arg=[20],
    column='Index')

user_cohort_index = Cohort(name='Cohort Index')
user_cohort_index.add_cohort_filter(cohort_filter_index)

# Predicted Y is greater than 50K.
cohort_filter_predicted_y = CohortFilter(
    method=CohortFilterMethods.METHOD_INCLUDES,
    arg=['>50K'],
    column='Predicted Y')

user_cohort_predicted_y = Cohort(name='Cohort Predicted Y')
user_cohort_predicted_y.add_cohort_filter(cohort_filter_predicted_y)

# True Y is greater than 50K.
cohort_filter_true_y = CohortFilter(
    method=CohortFilterMethods.METHOD_INCLUDES,
    arg=['>50K'],
    column='True Y')

user_cohort_true_y = Cohort(name='Cohort True Y')
user_cohort_true_y.add_cohort_filter(cohort_filter_true_y)

cohort_list = [user_cohort_age_and_hours_per_week,
               user_cohort_marital_status,
               user_cohort_index,
               user_cohort_predicted_y,
               user_cohort_true_y]

# Create a Responsible AI dashboard.
metric_frame_tf = ResponsibleAIDashboard(rai_insights, cohort_list=cohort_list, feature_flights="dataBalanceExperience")

# Specify the URL link.
metric_frame_tf.config['baseUrl'] =  'https://{}-proxy-{}.dsw-gateway-{}.data.aliyun.com'.format(
    os.environ.get('JUPYTER_NAME').replace("dsw-",""),
    urlparse(metric_frame_tf.config['baseUrl']).port,
    os.environ.get('dsw_region') )

Step 8: Access the Responsible AI dashboard to view error analysis

Click the URL to access the Responsible AI dashboard.

Error analysis:

Tree Map

Click Tree map and select an error rate in the Select metric section to perform error analysis. The tree map splits the data based on the model features by using a binary tree. For example, the following section describes the binary branches under the root node:
- marital-status == Married-civ-spouse(54/224)
- marital-status != Married-civ-spouse(18/276)
This example includes 500 samples with 72 prediction errors. The error rate is 14.4%. Each node displays the number of total data, prediction errors, and the error rate based on the branch conditions.
Take note of red nodes. A darker color indicates a higher error rate.
In this example, the deepest red leaf node shows a 43.40% data error rate in the binary tree path that has the following conditions:
- marital-status == Married-civ-spouse
- fnlwgt <= 207583
- hours-per-week > 40.5

Heat map

Click Heat map and select an error rate in the Select metric section to perform error analysis.
Optional. You can configure the following parameters:
- Quantile binning: A method of dividing continuous variables into intervals with the same data points.
  - OFF: Disable quantile binning. Use the default equal interval strategy in which the interval lengths are equal.
  - ON: Enable quantile binning. Each interval contains the same number of data points for uniform distribution.
- Binning threshold: the number of intervals into which the data is split. Adjust the threshold to change the interval number. In this example, the default value is 8, which specifies that the age and hours-per-week are split into 8 intervals.
In the heat map, you can select two features for cross data analysis. In this example, age and hours-per-week are selected for heat map analysis.
Take note of red nodes. A darker color indicates a higher error rate.
The analysis shows that the highest error rate (100%) exists in the following feature ranges.
- age [71.8,80.9], hours-per-week[39.0,51.0]
- age [44.4,53.5], hours-per-week[75.0,87.0]
- age [16.9,26.1], hours-per-week[63.0,75.0]
- ...