I deployed a Yelp Rating Prediction API (http://br-yelp-predict-rating.herokuapp.com) using Yelp’s open dataset and machine learning to train a model to predict reviews base on different categories.

I wrote an article Convert Yelp Dataset to CSV to demonstrate a step-by-step of how to load the gigantic file of the Yelp dataset, notably the 5.2 gigabytes and 6 million rows worth of review.json file to a more manageable CSV file. With over 6 million reviews in the review.json file, it could be troublesome to load inside a Jupyter Notebook. After successfully converting the dataset, check out my next post for explorative data analysis with visualization of the dataset!

My API takes in a json string with “category” and “review”. After sending the input to my API, it will respond with the predicted rating of the review.

When submitting a review, make sure to specify which category the review is for.

Example input:

{"category": "Auto Repair", 
 "review": "Service is the worst and the wait time is too long."}

The API will return a rating base on the category and review. Example Output:

{'Category': 'Auto_Repair',
 'Review': 'Service is the worst and the wait time is too long.',
 'Predict rating': 1}

Below is the list of categories used in the Yelp dataset:

  • Active Life
  • Auto Repair
  • Automotive
  • Beauty Spas
  • Contractors
  • Doctors
  • Event Planning Services
  • Fashion
  • Fast Food
  • Hair Salons
  • Health Medical
  • Home Garden
  • Home Services
  • Local Services
  • Professional Services
  • Real Estate
  • Shopping