Download PDFOpen PDF in browserAn Approach for Evaluating Semantic Similarity in Research Papers via Siamese BERT ArchitectureEasyChair Preprint 1544812 pages•Date: November 20, 2024AbstractDocument similarity analysis is critical for various NLP tasks like information retrieval and plagiarism detection. Traditional methods based on word-to-word mapping struggle with capturing contextual nuances. Existing solutions lack the capability to provide domain-specific accuracy and enriched search experiences. One such field is finding similar research papers. Often researchers struggle to find papers similar to a certain paper and have to rely on basic keyword-based search. This hinders to provide the best match based on the overall context. In this work, we propose a novel methodology that integrates BERT with a Siamese Neural Network to capture semantic textual similarity of research papers. Our approach goes beyond simple similarity evaluation by conducting a nuanced semantic search of overall context and provides a representative similarity score. This offers a more accurate and refined search experience. Furthermore, we curate a dataset of over 10,000 NLP research paper abstracts to train our model. The model excels in identifying the contextual relationships between documents, making it highly effective for domain-specific applications. This model can significantly improve the user experience in document retrieval systems, particularly for academic research and recommendation. Keyphrases: BERT, Data Science, NLP, Siamese Neural Network, semantic similarity
|