Getting Started

What is common_datasets?

This package provides an unofficial collection of datasets widely used in the evaluation of machine learning techniques, mainly small and imbalanced datasets for binary, multiclass classification and regression. The datasets are provided in the usual sklearn.datasets format, with missing data imputation and the encoding of category and ordinal features.

PLEASE DO NOT CITE OR REFER TO THIS PACKAGE IN ANY FORM!

Please cite the original works publishing and specifying these datasets:

@article{keel,
  author={Alcala-Fdez, J. and Fernandez, A. and Luengo, J. and Derrac, J. and Garcia, S.
          and Sanchez, L. and Herrera, F.},
  title={KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms
          and Experimental Analysis Framework},
  journal={Journal of Multiple-Valued Logic and Soft Computing},
  volume={17},
  number={2-3},
  year={2011},
  pages={255-287}}

@misc{uci,
  author = "Dua, Dheeru and Karra Taniskidou, Efi",
  year = "2017",
  title = "{UCI} Machine Learning Repository",
  url = "http://archive.ics.uci.edu/ml",
  institution = "University of California, Irvine, School of Information and Computer Sciences"}

@article{krnn,
  author={X. J. Zhang and Z. Tari and M. Cheriet},
  title={{KRNN}: k {Rare-class Nearest Neighbor} classification},
  journal={Pattern Recognition},
  year={2017},
  volume={62},
  number={2},
  pages={33--44}
  }