TY - GEN
T1 - Discovery of patterns in software metrics using clustering techniques
AU - Del Alamo, Cristian J.López
AU - Pizarro, Diego Aracena
AU - Pinto, Ricardo Valdivia
PY - 2012
Y1 - 2012
N2 - One mechanism for estimating software quality is through the use of metrics, which are functions that evaluates certain characteristics of the product quality development. A software product can be evaluated from different points of view, and in that sense, the results of the evaluations are numeric vectors, which together describe the quality of the software. This research uses data from NASA's open access which undergo a process of reducing the dimensionality by principal component analysis (PCA), then applied three clustering techniques and evaluates the best grouping using Rand Index. Finally, the top clusters are tested with regression to find the metrics that are related to the error of the Software. The results suggest that groups consisting of software modules whose code source have a higher average of blank lines, show a higher density of error. This could be interpreted as an indication of the order of implementation. On the other hand, shows the presence of a direct relationship between the number of errors in a module with the number of calls functions to other modules. The contribution of this work is related to the use of assessment techniques of clustering, dimensionality reduction, clustering algorithms and regression to discover patterns in software metrics a rigorous manner.
AB - One mechanism for estimating software quality is through the use of metrics, which are functions that evaluates certain characteristics of the product quality development. A software product can be evaluated from different points of view, and in that sense, the results of the evaluations are numeric vectors, which together describe the quality of the software. This research uses data from NASA's open access which undergo a process of reducing the dimensionality by principal component analysis (PCA), then applied three clustering techniques and evaluates the best grouping using Rand Index. Finally, the top clusters are tested with regression to find the metrics that are related to the error of the Software. The results suggest that groups consisting of software modules whose code source have a higher average of blank lines, show a higher density of error. This could be interpreted as an indication of the order of implementation. On the other hand, shows the presence of a direct relationship between the number of errors in a module with the number of calls functions to other modules. The contribution of this work is related to the use of assessment techniques of clustering, dimensionality reduction, clustering algorithms and regression to discover patterns in software metrics a rigorous manner.
KW - Boot-strapping
KW - Data Mining
KW - Principal component analysis
KW - clustering
KW - software metric
UR - http://www.scopus.com/inward/record.url?scp=84874293285&partnerID=8YFLogxK
U2 - 10.1109/CLEI.2012.6427229
DO - 10.1109/CLEI.2012.6427229
M3 - Conference contribution
AN - SCOPUS:84874293285
SN - 9781467307932
T3 - 38th Latin America Conference on Informatics, CLEI 2012 - Conference Proceedings
BT - 38th Latin America Conference on Informatics, CLEI 2012 - Conference Proceedings
T2 - 38th Latin America Conference on Informatics, CLEI 2012
Y2 - 1 October 2012 through 5 October 2012
ER -