Back to Journal
Research Article Open AccessOrclever Native
Hybrid Question-Answering System: A FAISS and BM25 Approach for Extracting Information from Technical Document
1Teracity Yazılım Teknolojileri A.Ş.
Published:December 31, 2024
DOI: 10.56038/oprd.v5i1.535
Vol. 5, No. 1 · pp. 226–237
Abstract
In this study, a hybrid question-answering system was developed to accelerate access to information contained in corporate technical documents and to generate appropriate responses to user queries. The system combines dense vector-based retrieval (FAISS) and sparse text-based retrieval (BM25) methods, integrated with the XLM-RoBERTa Large model. Evaluations conducted on a dataset consisting of 23 technical documents demonstrated the system's effectiveness in responding to both semantic and keyword-based queries. This study presents an innovative approach that enables fast and accurate access to information from technical documents, enhancing the efficiency of corporate knowledge management processes.
Keywords
Bilgi ÇıkarmaSoru-Cevap SistemleriFAISSBM25Teknik DokümanlarKurumsal Bilgi Yönetimi
References
- 1.C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval. Cambridge University Press, 2008.
- 2.A. Vaswani, N. Shazeer, N. Parmar, et al., “Attention is All You Need,” in Advances in Neural Information Processing Systems (NeurIPS), 2017, pp. 5998–6008.
- 3.J. Devlin, M.-W. Chang, K. Lee ve K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, ABD, Haz. 2019, ss. 4171–4186. [Çevrimiçi]. Erişim: https://aclanthology.org/N19-1423/Link
- 4.Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y. Xu, E. Ishii, Y. Bang, A. Madotto ve P. Fung, "Survey of Hallucination in Natural Language Generation," ACM Computing Surveys, cilt 55, sayı 12, s. 1–38, Şub. 2022. [Çevrimiçi]. Erişim: https://arxiv.org/pdf/2202.03629v1Link
- 5.P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Kuttler, M. Lewis, W.-t. Yih, T. Rocktäschel, S. Riedel ve D. Kiela, "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks," arXiv preprint arXiv:2005.11401, 2020. [Çevrimiçi]. Erişim: https://arxiv.org/abs/2005.11401Link
- 6.J. Johnson, M. Douze, and H. Jégou, "Billion-Scale Similarity Search with GPUs," IEEE Transactions on Big Data, vol. 7, no. 3, pp. 535–547, 2021. [Online]. Available: https://ieeexplore.ieee.org/document/8733051.Link
- 7.N. Reimers and I. Gurevych, "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks," in Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing and the 9th Int. Joint Conf. on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 2019, pp. 3982–3992. [Online]. Available: https://aclanthology.org/D19-1410.Link
- 8.S. E. Robertson and H. Zaragoza, "The Probabilistic Relevance Framework: BM25 and Beyond," Foundations and Trends in Information Retrieval, vol. 3, no. 4, pp. 333–389, 2009.
- 9.C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval. Cambridge University Press, 2008.
- 10.A. Conneau et al., "Unsupervised Cross-lingual Representation Learning at Scale," in Proc. of the 2020 Conf. on Empirical Methods in Natural Language Processing (EMNLP), 2020, pp. 8440–8451
Cite This Article
Hakdağlı, Ö. (2024). Hybrid Question-Answering System: A FAISS and BM25 Approach for Extracting Information from Technical Document. *Orclever Proceedings of Research and Development*, 5(1), 226-237. https://doi.org/10.56038/oprd.v5i1.535
Bibliographic Info
JournalOrclever Proceedings of Research and Development
Volume5
Issue1
Pages226–237
PublishedDecember 31, 2024
eISSN2980-020X