Select Page

DS 710 University of Wisconsin Milwaukee Python Pandas Data Frames Project

Question Description

Problem 1(a). Reading Amazon Reviews.

?? Download the file of Amazon gourmet food reviews from the Stanford Large Network Dataset Collection. (Your computer may already have a utility installed that can unzip the archive as a text file; if not, 7-zip is a useful utility for Windows. You can also use an online utility by doing a web search for: open .gz files online.)

?? Create a pandas DataFrame object with the following entries for each review:

  1. Product ID
  2. Number of people who voted this review helpful
  3. Total number of people who rated this review
  4. Reviewer’s score rating of the product
  5. Text of the review — this will be dropped before you write your data file and port to R.

Problem 1(b). Analyzing review text.

?? Add columns to your DataFrame for

  • the length of a review,
  • the number of exclamation points in a review, and
  • the fraction of people who rated a review helpful.

Problem 1(c). Summary statistics.

?? Compute these using Python:

  1. How many reviews are in the data set?
  2. What is the average length of a review (in characters)?
  3. What is the average rating?
  4. What is the greatest number of exclamation marks used in a single review?

Use the pandas package to answer these questions.

Problem 1(d). Export.

?? Save your DataFrame as a .csv file suitable for future analysis in R.

Requirements
  • Your .csv file must not include the review text column, as the presence of commas and quotation marks will make reading the file difficult.
  • You should also convert entries from NaN to the empty string before saving.

"Place your order now for a similar assignment and have exceptional work written by our team of experts, guaranteeing you "A" results."

Order Solution Now