Download PDFOpen PDF in browser

Exploring the limits of an RBSC-based approach in solving the subset selection problem

11 pagesPublished: September 20, 2022

Abstract

This study focuses on the subset selection problem of computational statistics and de- ploys the rank-biserial correlation (RBSC) based deck generation algorithm (RBSC-SubGen) [1] in solving it. RBSC-SubGen is originally designed for automatically building a desired number of vocabulary decks (out of a large corpus) with a desired level of word frequency relation, which shares many common aspects with the generic subset selection problem. In this article, we consider applying it not only on word corpora but any set of ranked items and study its resilience against various hyper-parameters, which are not treated in previ- ous studies. Namely, based on simulations we test RBSC-SubGen under various constraints and indicate the vulnerable aspects in terms of rate of saturation, computational cost and accuracy of obtained solution.

Keyphrases: computational statistics, ranking, stochastic systems, subset selection

In: Tokuro Matsuo (editor). Proceedings of 11th International Congress on Advanced Applied Informatics, vol 81, pages 1-11.

BibTeX entry
@inproceedings{IIAIAAI2021-Winter:Exploring_limits_RBSC_based,
  author    = {Kohei Furuya and Zeynep Yucel and Parisa Supitayakul and Akito Monden and Pattara Leelaprute},
  title     = {Exploring the limits of an RBSC-based approach in solving the subset selection problem},
  booktitle = {Proceedings of 11th International Congress on Advanced Applied Informatics},
  editor    = {Tokuro Matsuo},
  series    = {EPiC Series in Computing},
  volume    = {81},
  publisher = {EasyChair},
  bibsource = {EasyChair, https://easychair.org},
  issn      = {2398-7340},
  url       = {/publications/paper/p2n2},
  doi       = {10.29007/113l},
  pages     = {1-11},
  year      = {2022}}
Download PDFOpen PDF in browser