| TripAdvisor Ireland Dataset |
|
This page contains supplementary material for the paper: G. Wu; D. Greene; B. Smyth; P. Cunningham. (2010), "Distortion as a Validation Criterion in the Identification of Suspicious Reviews". In Proceedings of the First Workshop on Social Media Analytics (SOMA'10). [PDF] DescriptionAssessing the trustworthiness of reviews is a key issue for the maintainers of opinion sites such as TripAdvisor. In this paper we propose a distortion criterion for assessing the impact of methods for uncovering suspicious hotel reviews in TripAdvisor. The principle is that dishonest reviews will distort the overall popularity ranking for a collection of hotels. Thus a mechanism that deletes dishonest reviews will distort the popularity ranking significantly, when compared with the removal of a similar set of reviews at random. This distortion can be quantified by comparing popularity rankings before and after deletion, using rank correlation. We present an evaluation of this strategy in the assessment of shill detection mechanisms on a dataset of hotel reviews collected from TripAdvisor. Datasets
The Irish TripAdvisor dataset provided here comprises 29,799 reviews published by 2,1851 unique reviewers on Tripadvisor, covering hotels from all regions of Ireland over a two-year time window from September 2007 to September 2009. Note that we only consider a subset of 843 hotels which received four or more reviews during this time. Approximately two thirds of the reviews are positive – i.e.awarding at least four out of five stars. The dataset is anonymised and consists of a single comma-separate (CSV) file, with a header row and fields <HotelID, MemberName, DayNumber, Rating>: |