A distance metric for ordinal data based on misclassification

Dreas Nielsen

doi:10.58205/jiamcs.v3i2.83

pdf

Published: Jan 20, 2024

DOI: https://doi.org/10.58205/jiamcs.v3i2.83

Keywords:

ordinal, distance, multinomial, categorical, misclassification

Dreas Nielsen

Integral Consulting Inc., 508 Yale Ave. N. Suite 204, Seattle WA 98109, United States

https://orcid.org/0009-0008-2698-3611

Abstract

Distances between data sets are used for analyses such as classification and clustering analyses. Some existing distance metrics, such as the Manhattan (City Block or L1 ) distance, are suitable for use with categorical data, where the data subtype is numeric, or more specifically, integers. However, ordinality of categories imposes additional constraints on data distributions, and the ordering of categories should be considered in the calculation of distances. A new distance metric is presented here that is based on the number of misclassifications that must have occurred within one data set if it were in fact identical to another data set. This "misclassification distance" is equivalent to the number of reclassifications necessary to transform one data set into another. This metric takes account not only of the numbers of observations in corresponding ordinal categories, but also of the number of categories across which observations must be moved to correct all misclassifications. Each stepwise movement of an observation across one or more categories that is required to equalize the distributions increases the distance metric, thus this method is referred to as a stepwise ordinal misclassification distance (SOMD). An algorithm is provided for the calculation of this metric.

Downloads

Download data is not yet available.

How to Cite

[1]

Nielsen, D. 2024. A distance metric for ordinal data based on misclassification. Journal of Innovative Applied Mathematics and Computational Sciences. 3, 2 (Jan. 2024), 156–161. DOI:https://doi.org/10.58205/jiamcs.v3i2.83.

ARK

https://n2t.net/ark:/49935/jiamcs.v3i2.83

Issue

Vol. 3 No. 2 (2023): July-December

Section

Research Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Authors keep the rights and guarantee the Journal of Innovative Applied Mathematics and Computational Sciences the right to be the first publication of the document, licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License that allows others to share the work with an acknowledgement of authorship and publication in the journal.
Authors are allowed and encouraged to spread their work through electronic means using personal or institutional websites (institutional open archives, personal websites or professional and academic networks profiles) once the text has been published.

References

Shirkhorshidi, A., Aghabozorgi, S., & Wah, T. (2015). A comparison study on similarity and dissimilarity measures in clustering continuous data. PLoS ONE, 10(12). https://doi.org/10.1371/journal.pone.0144059

Tabek, J. (2014). Geometry: the language of space and form. في History of Mathematics. Infobase Publishing.

Jajuga, K., Walesiak, M., & Bak, A. (2003). On The General Distance Measure . في Exploratory Data Analysis in Empirical Research. Berlin: Springer.

Navarro, G. (2001). A guided tour to approximate string matching. ACM Computing Surveys, 33(1), 31-88. https://doi.org/10.1145/375360.375365

Likert, R. (1932). A technique for the measurement of attitudes. Arch. Psychol., 22(140), 1-55.

Zaborski, A. (2013). Distance measures in aggregating preference data. Folia Oeconomica, 3(302), 183-190.

Fernández, D., & Pledger, S. (2015). Categorising count data into ordinal responses with application to ecological communities. Journal of Agricultural, Biological, and Environmental Statistics, 21, 348-362. https://doi.org/10.1007/s13253-015-0240-3

Panaretos, V., & Zemel, Y. (2019). Statistical aspects of Wasserstein distances. Annual Review of Statistics and Its Application, 6(1), 405-431. https://doi.org/10.1146/annurev-statistics-030718-104938

Kleindessner, M., & von Luxburg, U. (2017). Lens depth function and k-relative neighborhood graph: versatile tools for ordinal data analysis. Journal of Machine Learning Research, 18(58), 1-52. https://doi.org/10.48550/arXiv.1602.07194

Cicirello, V. (2020). Kendall tau sequence distance: extending Kendall tau from ranks to sequences. EAI Endorsed Transactions on Industrial Networks and Intelligent Systems, 7(23), 1-20. https://doi.org/10.4108/eai.13-7-2018.163925

Walesiak, M. (1999). Distance measure for ordinal data. Argumenta Oeconomica, 2, 167-173.

Cook, W. (2006). Distance-based and ad hoc consensus models in ordinal preference ranking. European Journal of Operational Research, 172, 369-385. https://doi.org/10.1016/j.ejor.2005.03.048

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

References