Download PDFOpen PDF in browserExploring the limits of an RBSC-based approach in solving the subset selection problem11 pages•Published: September 20, 2022AbstractThis study focuses on the subset selection problem of computational statistics and de- ploys the rank-biserial correlation (RBSC) based deck generation algorithm (RBSC-SubGen) [1] in solving it. RBSC-SubGen is originally designed for automatically building a desired number of vocabulary decks (out of a large corpus) with a desired level of word frequency relation, which shares many common aspects with the generic subset selection problem. In this article, we consider applying it not only on word corpora but any set of ranked items and study its resilience against various hyper-parameters, which are not treated in previ- ous studies. Namely, based on simulations we test RBSC-SubGen under various constraints and indicate the vulnerable aspects in terms of rate of saturation, computational cost and accuracy of obtained solution.Keyphrases: computational statistics, ranking, stochastic systems, subset selection In: Tokuro Matsuo (editor). Proceedings of 11th International Congress on Advanced Applied Informatics, vol 81, pages 1-11.
|