SAGEMAKER DEMO 1 – Fixed Deposit

Step 1: Create an Amazon SageMaker notebook instance
Step 2: Prepare the data
Step 3: Train the ML model
Step 4: Deploy the model
Step 5: Evaluate model performance
Step 6: Clean up
Source Code

As a sales manager in a bank, you are asked to increase the bank revenue by finding customer to do fixed deposit. Instead of cold call each client, can we use Machine Learning to find the potential customers?

Step 1: Create an Amazon SageMaker notebook instance

Firstly, you create the notebook instance that you use to download and process your data. As part of the creation process, you also create an Identity and Access Management (IAM) role that allows Amazon SageMaker to access data in Amazon Simple Storage Service (Amazon S3).

Sign in to the Amazon SageMaker console, and in the top right corner, select your preferred AWS Region. This tutorial uses the ap-ast-1 (Hong Kong) Region.

In the left navigation pane, choose Notebook instances, then choose Create notebook instance.

c. On the Create notebook instance page, in the Notebook instance setting box, fill the following fields:

For Notebook instance name, type SageMaker-Tutorial.
For Notebook instance type, choose ml.t3.medium.
For Platform identifier, keep the default selection(Amazon Linux 2, Jupyter Lab 3).

d. In the Permissions and encryption section, for IAM role, choose Create a new role, and in the Create an IAM role dialog box, select Any S3 bucket and choose Create role.

Note: If you already have a bucket that you’d like to use instead, choose Specific S3 buckets and specify the bucket name. Choose Any S3 bucket

Amazon SageMaker creates the AmazonSageMaker-ExecutionRole-Today-Now role.

e. Keep the default settings for the remaining options and choose Create notebook instance.

In the Notebook instances section, the new SageMaker-Tutorial notebook instance is displayed with a Status of Pending. The notebook is ready when the Status changes to InService.

Step 2: Prepare the data

In this step, you use your Amazon SageMaker notebook instance to preprocess the data that you need to train your machine learning model and then upload the data to Amazon S3.

a. After your SageMaker-Tutorial notebook instance status changes to InService, choose Open Jupyter.

b. In Jupyter, choose New and then choose conda_python3.

c. In a new code cell on your Jupyter notebook, copy and paste the following code and choose Run.

This code imports the required libraries and defines the environment variables you need to prepare the data, and train and deploy the ML model.

# import libraries
import boto3, re, sys, math, json, os, sagemaker, urllib.request
from sagemaker import get_execution_role
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import Image
from IPython.display import display
from time import gmtime, strftime
from sagemaker.serializers import CSVSerializer

# Define IAM role
role = get_execution_role()
prefix = 'sagemaker/DEMO-xgboost-dm'
my_region = boto3.session.Session().region_name # set the region of the instance# this line automatically looks for the XGBoost image URI and builds an XGBoost container.
xgboost_container = sagemaker.image_uris.retrieve("xgboost", my_region, "latest")

print("Success - the MySageMakerInstance is in the " + my_region + " region. You will use the " + xgboost_container + " container for your SageMaker endpoint.")

Output:

Success - the MySageMakerInstance is in the ap-east-1 region. You will use the 286214385809.dkr.ecr.ap-east-1.amazonaws.com/xgboost:latest container for your SageMaker endpoint.

d. Create the S3 bucket to store your data. Copy and paste the following code into the next code cell and choose Run.

Note: Make sure to replace the bucket_name your-s3-bucket-name with a unique S3 bucket name. If you don’t receive a success message after running the code, change the bucket name and try again.

e. Download the data to your SageMaker instance and load the data into a dataframe. Copy and paste the following code into the next code cell and choose Run.

try:
  urllib.request.urlretrieve ("https://d1.awsstatic.com/tmt/build-train-deploy-machine-learning-model-sagemaker/bank_clean.27f01fbbdf43271788427f3682996ae29ceca05d.csv", "bank_clean.csv")
  print('Success: downloaded bank_clean.csv.')
except Exception as e:
  print('Data load error: ',e)

try:
  model_data = pd.read_csv('./bank_clean.csv',index_col=0)
  print('Success: Data loaded into dataframe.')
except Exception as e:
    print('Data load error: ',e)

Output:

Success: downloaded bank_clean.csv.
Success: Data loaded into dataframe.

f. Shuffle and split the data into training data and test data. Copy and paste the following code into the next code cell and choose Run.

The training data (70% of customers) is used during the model training loop. You use gradient-based optimization to iteratively refine the model parameters. Gradient-based optimization is a way to find model parameter values that minimize the model error, using the gradient of the model loss function.

The test data (remaining 30% of customers) is used to evaluate the performance of the model and measure how well the trained model generalizes to unseen data.

train_data, test_data = np.split(model_data.sample(frac=1, random_state=1729), [int(0.7 * len(model_data))])
print(train_data.shape, test_data.shape)

Output:

(28831, 61) (12357, 61)

Check training data and testing data.

train_data

test_data

Step 3: Train the ML model

In this step, you use your training dataset to train your machine learning model.

a. In a new code cell on your Jupyter notebook, copy and paste the following code and choose Run.

This code reformats the header and first column of the training data and then loads the data from the S3 bucket. This step is required to use the Amazon SageMaker pre-built XGBoost algorithm.

bucket_name = 'lab-d7c1ba26' # change to your bucket name

pd.concat([train_data['y_yes'], train_data.drop(['y_no', 'y_yes'], axis=1)], axis=1).to_csv('train.csv', index=False, header=False)
boto3.Session().resource('s3').Bucket(bucket_name).Object(os.path.join(prefix, 'train/train.csv')).upload_file('train.csv')
s3_input_train = sagemaker.inputs.TrainingInput(s3_data='s3://{}/{}/train'.format(bucket_name, prefix), content_type='csv')

b. Set up the Amazon SageMaker session, create an instance of the XGBoost model (an estimator), and define the model’s hyperparameters. Copy and paste the following code into the next code cell and choose Run.

sess = sagemaker.Session()
xgb = sagemaker.estimator.Estimator(xgboost_container,role, instance_count=1, instance_type='ml.m5.xlarge',output_path='s3://{}/{}/output'.format(bucket_name, prefix),sagemaker_session=sess)
xgb.set_hyperparameters(max_depth=5,eta=0.2,gamma=4,min_child_weight=6,subsample=0.8,silent=0,objective='binary:logistic',num_round=100)

c. Start the training job. Copy and paste the following code into the next code cell and choose Run.

This code trains the model using gradient optimization on a ml.m5.xlarge instance. After a few minutes, you should see the training logs being generated in your Jupyter notebook.

xgb.fit({'train': s3_input_train})

Step 4: Deploy the model

In this step, you deploy the trained model to an endpoint, reformat and load the CSV data, then run the model to create predictions.

a. In a new code cell on your Jupyter notebook, copy and paste the following code and choose Run.

This code deploys the model on a server and creates a SageMaker endpoint that you can access. This step may take a few minutes to complete.

xgb_predictor = xgb.deploy(initial_instance_count=1,instance_type='ml.m5.xlarge')

b. To predict whether customers in the test data enrolled for the bank product or not, copy the following code into the next code cell and choose Run.



test_data_array = test_data.drop(['y_no', 'y_yes'], axis=1).values #load the data into an array
xgb_predictor.serializer = CSVSerializer() # set the serializer type
predictions = xgb_predictor.predict(test_data_array).decode('utf-8') # predict!
predictions_array = np.fromstring(predictions[1:], sep=',')
print(predictions_array.shape)

Output:

(12357,)

Step 5: Evaluate model performance

In this step, you evaluate the performance and accuracy of the machine learning model.

This code compares the actual and predicted values in a table called a confusion matrix.

Based on the prediction, we can conclude that you predicted a customer will enroll for a certificate of deposit accurately for 90% of customers in the test data, with a precision of 65% (278/429) for enrolled and 90% (10,785/11,928) for didn’t enroll.

cm = pd.crosstab(index=test_data['y_yes'], columns=np.round(predictions_array), rownames=['Observed'], colnames=['Predicted'])
tn = cm.iloc[0,0]; fn = cm.iloc[1,0]; tp = cm.iloc[1,1]; fp = cm.iloc[0,1]; p = (tp+tn)/(tp+tn+fp+fn)*100
print("\n{0:<20}{1:<4.1f}%\n".format("Overall Classification Rate: ", p))
print("{0:<15}{1:<15}{2:>8}".format("Predicted", "No Purchase", "Purchase"))
print("Observed")
print("{0:<15}{1:<2.0f}% ({2:<}){3:>6.0f}% ({4:<})".format("No Purchase", tn/(tn+fn)*100,tn, fp/(tp+fp)*100, fp))
print("{0:<16}{1:<1.0f}% ({2:<}){3:>7.0f}% ({4:<}) \n".format("Purchase", fn/(tn+fn)*100,fn, tp/(tp+fp)*100, tp))

Step 6: Clean up

In this step, you terminate the resources you used in this lab.

Important: Terminating resources that are not actively being used reduces costs and is a best practice. Not terminating your resources will result in charges to your account.

a. Delete your endpoint: In your Jupyter notebook, copy and paste the following code and choose Run.

xgb_predictor.delete_endpoint(delete_endpoint_config=True)

b. Delete your training artifacts and S3 bucket: In your Jupyter notebook, copy and paste the following code and choose Run.

bucket_to_delete = boto3.resource('s3').Bucket(bucket_name)
bucket_to_delete.objects.all().delete()

c. Delete your SageMaker Notebook: Stop and delete your SageMaker Notebook.

Open the SageMaker console.
Under Notebooks, choose Notebook instances.
Choose the notebook instance that you created for this tutorial, then choose Actions, Stop. The notebook instance takes up to several minutes to stop. When Status changes to Stopped, move on to the next step.
Choose Actions, then Delete.
Choose Delete.

Source Code

Download the notebook here .

Download the result notebook here .

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.