2024 Week 02 – Text Clustering in Einstein Discovery

Introduction

Welcome to Week 2 of the 2024 CRMA Workout Wednesday challenge. This week we will be working with Text Clustering in Einstein Discovery Models. Text Clustering, sometimes called Mining or Data Mining, is the function of analyzing unstructured data to group similar context or pieces of text into categories or… clusters.

Often times our models are ingesting tabular data to analyze and predict but how can we derive value from data that is not collected in this standard? Unstructured Data can be collected in the form of: Customer Feedback, Product Reviews, Survey Responses, Chat Dialogue, Emails, etc.

In this challenge you will create an Einstein Model to analyze review ratings and review comments. By utilizing the text clustering feature in the model, you can determine what key words within the review text have an impact on the user review rating.

Requirements

Create a model to analyze review rating
Include City, Province, Name, Review Text and Categories in your settings
Train your model and review the data insights
1. Can you determine which key words in Review Text have a positive/negative impact on Rating?
2. What change can you implement in the Categories setting to group a large amount of undefined values? Hint: it shows in the screen shot below
Update your model to Detect Sentiment in the Review Text setting and re-run.
1. What categories had the most positive impact on reviews?

Tipps

you can’t create a model without a dataset
you have to manually configure the unstructured data settings before running your model

Dataset

You can find the dataset to use for this weeks challenge here: https://data.world/datafiniti/hotel-reviews/workspace/file?filename=Datafiniti_Hotel_Reviews.csv

After you finish your workout, share a screenshot of your solutions or interesting insights.

Either on Twitter using the hashtags #WOW2024 and #CRMA and tag @genetis, @PreenzJ, @LaGMills and @msayantani . (Or you can use this handy link to do that)

Or on LinkedIn, tagging Alex Waleczek, Preena Johansen, Lauren Mills, Sayantani Mitra and Phillip Schrijnemaekers using the hashtags #WOW2024

Also make sure to fill out the Submission Tracker to track your progress and help us judge the difficulty of our challenges.

2024 Week 02 – Text Clustering in Einstein Discovery

Introduction

Requirements

Dataset

Share

Solution