Home->Download->Yelp Dataset

Yelp Dataset




Yelp Dateset contains 8 categories: restaurants, shopping, nightlife, pets, hotelstravel, homeservices, beautysvc and active, which we crawled form Yelp.com website. Most of the users of these categories are from New York City. We already remove the personal information before sharing the dataset. Now we give some descriptions for the shared dataset.

In each category, there are ratings, users, items, training set and test set file.

· All ratings are contained in the file "ratings.txt" and are in the following format:
UserID::ItemID::Rating
for example: 879::14589::3
It means that user ( with the user id 879) rates the item 14589 and gives score 3. Ratings scores are made on a 5-star scale (whole-star ratings only).

· User information is in the file "users.txt" and is in the following format:
UserID:{Friend1ID,Friend2ID,...,}
for example: 751:{1838,255,1326,382,}
It means that user 751 has four friends: 1838, 255, 1326, 382 in this category. Only users who have provided some demographic information are included in this data set.
And to protect users' privacy, we reset users' and items' id.

· Item information is in the file "items.txt" and is in the following format:
ItemID::Category
for example: 58::Seafood French
It means that item 58 is belong to the category Seafood French. The category of items is crawled from Yelp. We give the subcategories and data statistic of each category.

·Training data and test data is in the file "restaurants_training.txt" and "restaurants_test.txt" (e.g.)
This information is similar to ratings. And we just use 80% of ratings data as the training set and the remaining 20% as the test set. More formally, we use 80% of each user’s rating data as the training set to insure all users’ latent features are learnt in the training set.

Dataset Rights
 

The YELP dataset consists of 8 categories downloaded form Yelp.com website. Use of these data must respect the corresponding terms of use.

Citation

If you use the YELP dataset please cite our papers:
[1] He Feng, and Xueming Qian, “Recommendation via user’s personality and social contextual”, ACM CIKM 2013.

[2] Xueming Qian, He Feng, Guoshuai Zhao, and Tao Mei, “Personalized Recommendation Combining User Interest and Social Circle”, IEEE Trans. Knowledge and Data Engineering, vol.x, no.y, 2013, pp.xx-yy. Accepted

Downloads

Dataset: A set of users, items and ratings information of 8 categories(10555 users, 96974 items and 300847 ratings, 4.69M)

Publications

We list the papers using Yelp as follows:
[1] He Feng, and Xueming Qian, “Recommendation via user’s personality and social contextual”, ACM CIKM 2013.

[2] Xueming Qian, He Feng, Guoshuai Zhao, and Tao Mei, “Personalized Recommendation Combining User Interest and Social Circle”, IEEE Trans. Knowledge and Data Engineering, vol.x, no.y, 2013, pp.xx-yy. Accepted

 

 



 
ҳģҳģ