rszk/sea: Sentiment Expression Annotation dataset

Sentiment Expression Annotation dataset

Rasoul Kaljahi 95a8d26eb9 New README		5 years ago
ann	0bafafb402 Script and annotations added	5 years ago
lib	d36709c8b6 New changes	5 years ago
README.md	95a8d26eb9 New README	5 years ago
generate-sea.py	0bafafb402 Script and annotations added	5 years ago

This package contains the Sentiment Expression Annotation (SEA) described in [1]. The SEA annotations are created on the English dataset released for task 5 of SemEval 2016 shared task on aspect-based sentiment analysis (ABSA).

Due to licensing restrictions, the original data cannot accompany these annotations. As a workaround, a script has been included which takes in the original dataset and attaches the SEA annotations. The script should be used as follows:

python generate-sea.py -x <PATH_TO_XML_FILE> -d <DOMAIN> -s <SUBSET>

PATH_TO_XML_FILE points to the original XML file released by SemEval 2016 task 5 organisers (for subtask 1), which can be found at http://alt.qcri.org/semeval2016/task5/index.php?id=data-and-tools. DOMAIN is either laptop or restaurant (not hotel) and SUBSET is either train or test. For example, assuming that the original XML files have been downloaded into the current directory, the following command generates the annotation files for the laptop training set in the current directory:

python generate-sea.py -x ABSA16_Laptops_Train_SB1_v2.xml -d laptop -s train

The generated annotation files are as follows:

train.laptop.at aspect terms and their polarities one per line, extracted from the XML file
train.laptop.aio SEA annotation in columnar IO tagging format, one token per line, with blank lines separating sentences

The SEA annotations in the aio file match the aspect terms in the at file. This means that the first sentencs in the aio file is the SEA annotation corresponding to the first aspect term in the at file, and so on.

The annotations are in columnar format where the tokens constituting the sentiment expression are tagged with I and the others with O.

Note that the original sentences have been tokenized by the script.

[1] Rasoul Kaljahi and Jennifer Foster. Sentiment expression boundaries in sentiment polarity classification. In Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA), 2018.

README.md