In recent years, mobile accessibility has become an important trend with the goal of allowing all users the possibility of using any app without many limitations. User reviews include insights that are useful for app evolution. However, with the increase in the amount of received reviews, manually analyzing them is tedious and time-consuming, especially when searching for accessibility reviews. The goal of this paper is to support the automated identification of accessibility in user reviews, to help technology professionals in prioritizing their handling, and thus, creating more inclusive apps. Particularly, we design a model that takes as input accessibility user reviews, learns their keyword-based features, in order to make a binary decision, for a given review, on whether it is about accessibility or not. The model is evaluated using a total of 5326 mobile app reviews. The findings show that (1) our model can accurately identify accessibility reviews, outperforming two baselines, namely keyword-based detector and a random classifier; (2) our model achieves an accuracy of 85 % with relatively small training dataset; however, the accuracy improves as we increase the size of the training dataset.
More specifically, the research questions that we investigated are:
RQ1. To what extent machine learning models can accurately distinguish accessibility reviews from non-accessibility reviews?
To answer this research question, we rely on a manually curated dataset of 2663 accessibility reviews, which we augment with another 2663 non-accessibility reviews. Then we perform a comparative study between state-of-the-art binary classification models, to identify the best model that can properly detect accessibility reviews, from non-accessibility reviews.
RQ2. How effective is our machine learning approach in identifying accessibility reviews?
To answer this research question, we rely on a manually curated dataset of 2663 accessibility reviews, which we augment with another 2663 non-accessibility reviews. Then we perform a comparative study between state-of-the-art binary classification models, to identify the best model that can properly detect accessibility reviews, from non-accessibility reviews.
RQ3. What is the size of the training dataset needed for the classification to effectively identify accessibility reviews?
In this research question, we empirically extract the minimum number of training instances, i.e., accessibility reviews, needed for our best performing model, to achieve its best performance. Such information is useful for practitioners, to estimate the amount of manual work needs to be done (i.e., preparation of training data) to design this solution.
Request
-------------
{
"Inputs": {
"input1": {
"ColumnNames": [
"review_text"
],
"Values": [
[
"value"
],
[
"value"
]
]
}
},
"GlobalParameters": {}
}
Response
-------------
{
"Results": {
"output1": {
"type": "DataTable",
"value": {
"ColumnNames": [
"Scored Labels"
],
"ColumnTypes": [
"String"
],
"Values": [
[
"value"
],
[
"value"
]
]
}
}
}
}
import urllib2
# If you are using Python 3+, import urllib instead of urllib2
import json
data = {
"Inputs": {
"input1":
{
"ColumnNames": ["review_text"],
"Values": [ [ "value" ], [ "value" ], ]
}, },
"GlobalParameters": {
}
}
body = str.encode(json.dumps(data))
url = 'https://ussouthcentral.services.azureml.net/workspaces/41654fb2238f449daf8dc7954f22ee9b/services/128f31001e794cd19ab042010d1f4a0e/execute?api-version=2.0&details=true'
api_key = 'FrykEwu75lBeqhdG/Iz8NdwmY0GOrAJVNl+f+BcXCPChMrDkYRL4S/F7E33YxFRkJrese1giJ9NrWOBxJjWgag==' # API key for the web service
headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ api_key)}
req = urllib2.Request(url, body, headers)
try:
response = urllib2.urlopen(req)
# If you are using Python 3+, replace urllib2 with urllib.request in the above code:
# req = urllib.request.Request(url, body, headers)
# response = urllib.request.urlopen(req)
result = response.read()
print(result)
except urllib2.HTTPError, error:
print("The request failed with status code: " + str(error.code))
# Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure
print(error.info())
print(json.loads(error.read()))
If you are interested to learn more about the process we followed, please refer to our paper.
Wajdi Aljedaani, Mohamed Wiem Mkaouer, Stephanie Ludi, Ali Ouni, Ilyes Jenhani, "On the Identification of Accessibility Bug Reports in Open Source Systems", Published at the 19th International Web for All Conference (W4A’22). [preprint]
Wajdi Aljedaani, Mohamed Wiem Mkaouer, Stephanie Ludi, Yasir Javed, "Automatic Classification of Accessibility User Reviews in Android Apps", Published at the 7th International Conference on Data Science and Machine Learning Applications (CDMA'22). [preprint]
Wajdi Aljedaani, Furqan Rustam, Stephanie Ludi, Ali Ouni, and Mohamed Wiem Mkaouer, "Learning Sentiment Analysis for Accessibility User Reviews", Published at the 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW'21) [preprint]