Jose Luis Silva, Ph.D.

0 %
Jose Luis Silva, Ph.D.
Physicist || Ph.D. || Founder
  • PhD. in Physics
    UU πŸ‡ΈπŸ‡ͺ
  • Postdoc in AI:
    LiU πŸ‡ΈπŸ‡ͺ
  • Founder
    Aicavity Academy πŸ‡ΈπŸ‡ͺ
  • Co-Founder:
    Oxaala LTDA. πŸ‡ΈπŸ‡ͺ πŸ‡§πŸ‡·
  • Life:
    Brazilian-Swedish πŸ‡§πŸ‡· πŸ‡ΈπŸ‡ͺ
Research Interests:
  • Artificial Intelligence & Machine Learning
  • Graphs, Computer Vision & NLP
  • Data Science, Analytics & Decision Making
  • Deep Learning & Reinforcement Learning

Analyze Datasets and Train ML Models using AutoML

May 13, 2022

4 Projects Description:

Perform a multiclass classification for sentiment analysis by first ingesting product reviews data into a central repository, Amazon S3 bucket. Then, we will use Amazon Athena and AWS Glue Machine Learning tools to analyze the data and visualize the data with interactive queries, which will be used during the model development process. Use SageMaker BlazingText built-in algorithm to predict the sentiment for each customer review. BlazingText is a variant of FastText which is based on word2vec. Bias can be present in your data before any model training occurs. Inspecting the dataset for bias can help detect collection gaps, inform your feature engineering, and understand societal biases the dataset may reflect. In this lab you will analyze bias on the dataset, generate and analyze bias report, and prepare the dataset for the model training.Use Amazon Sagemaker Autopilot to train a BERT-based natural language processing (NLP) model. The model will analyze customer feedback and classify the messages into positive (1), neutral (0) and negative (-1) sentiment.

My Solutions: Practical Data Science Projects from Coursera, DeepLearning.AI and Amazon Web Services

– My Certificate –

ML Pipeline using Amazon Sagemaker

Project 1:

Analyze Datasets and Train ML Models using AutoML

Perform a multiclass classification for sentiment analysis by first ingesting product reviews data into a central repository, Amazon S3 bucket. Then, we will use Amazon Athena and AWS Glue Machine Learning tools to analyze the data and visualize the data with interactive queries, which will be used during the model development process.

Steps

1. List and access the Women's Clothing Reviews dataset files hosted in an S3 bucket

2. Install and import AWS Data Wrangler

3. Create an AWS Glue Catalog database and list all Glue Catalog databases

4. Register dataset files with the AWS Glue Catalog

5. Write SQL queries to answer specific questions on your dataset and run your queries with Amazon Athena

6. Return the query results in a pandas dataframe

7. Produce and select different plots and visualizations that address your questions

Project 2:

Detect data bias with Amazon SageMaker Clarify

Bias can be present in your data before any model training occurs. Inspecting the dataset for bias can help detect collection gaps, inform your feature engineering, and understand societal biases the dataset may reflect. In this lab you will analyze bias on the dataset, generate and analyze bias report, and prepare the dataset for the model training.

Steps

1. Download and save raw unbalanced dataset

2. Analyze bias with open source Clarify

3. Balance the dataset

4. Analyze bias at scale with a Amazon SageMaker processing job and Clarify

5. Analyze bias reports before and after balancing the dataset

Project 3:

SageMaker pipelines to train a BERT-Based text classifier

Use Amazon Sagemaker Autopilot to train a BERT-based natural language processing (NLP) model. The model will analyze customer feedback and classify the messages into positive (1), neutral (0) and negative (-1) sentiment

Steps

1. Dataset review

2. Configure the Autopilot job

3. Launch Autopilot job

4. Track Autopilot job progress

5. Feature engineering

6. Model training and tuning

7. Review all output

8. Deploy and test best candidate model

Project 4:

Train a text classifier using Amazon SageMaker BlazingText built-in algorithm

Use SageMaker BlazingText built-in algorithm to predict the sentiment for each customer review. BlazingText is a variant of FastText which is based on word2vec. For more information on BlazingText, see the documentation here: https://docs.aws.amazon.com/sagemaker/latest/dg/blazingtext.html

Steps

1. Prepare dataset

2. Train the model with Amazon SageMaker BlazingText

3. Deploy the model

4. Test the model

Posted in Artificial Intelligence, Deep Learning, Machine LearningTags:
Write a comment