Learning to classify software defects from crowds: A novel approach

Hernández-González, Jerónimo; Rodriguez, Daniel; Inza, Iñaki; Harrison, Rachel; Lozano, Jose A.

Journal Article

Learning to classify software defects from crowds: A novel approach

Abstract

In software engineering, associating each reported defect with a category allows, among many other things, for the appropriate allocation of resources. Although this classification task can be automated using standard machine learning techniques, the categorization of defects for model training requires expert knowledge, which is not always available. To circumvent this dependency, we propose to apply the learning from crowds paradigm, where training categories are obtained from multiple non-expert annotators (and so may be incomplete, noisy or erroneous) and, dealing with this subjective class information, classifiers are efficiently learnt. To illustrate our proposal, we present two real applications of the IBM’s orthogonal defect classification working on the issue tracking systems from two different real domains. Bayesian network classifiers learnt using two state-of-the-art methodologies from data labeled by a crowd of annotators are used to predict the category (impact) of reported software defects. The considered methodologies show enhanced performance regarding the straightforward solution (majority voting) according to different metrics. This shows the possibilities of using non-expert knowledge aggregation techniques when expert knowledge is unavailable.

Attached files

fulltext.pdf
Show details

Type: PDF Document Filename: fulltext.pdf Size: 491.13 KB Views (since Sept 2022): 234

Full screen

Authors

Hernández-González, Jerónimo
Rodriguez, Daniel
Inza, Iñaki
Harrison, Rachel
Lozano, Jose A.

Oxford Brookes departments

Faculty of Technology, Design and Environment\Department of Computing and Communication Technologies

Dates

Year of publication: 2017
Date of RADAR deposit: 2017-12-04

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Related resources

This RADAR resource is the Accepted Manuscript of Learning to classify software defects from crowds: A novel approach
This RADAR resource is Cited by Two datasets of defect reports labeled by a crowd of annotators of unknown reliability

Details

Owner: Joseph Ripp
Collection: Outputs
Version: 1 (show all)
Status: Live
Views (since Sept 2022): 481