An Empirical Comparison of Claude, Llama, and Gemini for Aspect-Level Sentiment

Pınar Ersoy; Mustafa Erşahin

doi:10.56038/ejrnd.v5i1.659

Back to Journal

Research Article Open AccessOrclever Native

An Empirical Comparison of Claude, Llama, and Gemini for Aspect-Level Sentiment

Pınar Ersoy¹^✉,

Mustafa Erşahin²

¹Commencis Teknoloji

²Commencis Teknoloji

Published:November 23, 2025

DOI: 10.56038/ejrnd.v5i1.659

Vol. 5, No. 1

Abstract

Aspect-based sentiment analysis provides granular insights into customer feedback by identifying discrete aspects, such as features or topics, and assigning a corresponding sentiment to each. This study assesses three large language models, hereafter referred to as LLMs, namely Google Gemini 2.5 Flash-Lite, Anthropic Claude Sonnet-4 delivered through AWS Bedrock, and Meta LLaMA 3.3 70B delivered through AWS Bedrock, using a real-world multilingual corpus of 7,841 Turkish mobile banking app reviews from İşbank in Turkey. We employ a prompt-based tagging protocol to extract aspect–sentiment pairs from every review, and we compare accuracy, F1-score, inference cost, and latency. The results show that all three LLMs can execute multilingual aspect extraction and sentiment categorization without task-specific fine-tuning. Claude Sonnet-4 attains the highest F1 for aspect extraction and the highest sentiment accuracy, although it incurs a markedly higher inference cost. Gemini 2.5 Flash-Lite achieves competitive accuracy at a fraction of the price, making it well-suited for high-volume analytics. Meta LLaMA at the 70B scale accessed through AWS Bedrock exhibits intermediate performance with moderate cost and latency. We provide detailed performance tables and figures, along with best-practice guidance for enterprise deployment. AWS Bedrock enables the strategic selection of Claude and LLaMA 3.3 70B for multilingual sentiment analysis, offering valuable insights from app reviews within scale, accuracy, and budget constraints.

Keywords

Natural Language ProcessingSentiment AnalysisGenerative AILarge Language Models

References

1.M. Pontiki, D. Galanis, J. Pavlopoulos, H. Papageorgiou, I. Androutsopoulos, and S. Manandhar, “SemEval-2014 Task 4: Aspect Based Sentiment Analysis,” Proceedings of the 8th International Workshop on Semantic Evaluation, pp. 27–35, 2014.
2.D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet Allocation,” Journal of Machine Learning Research, vol. 3, pp. 993–1022, 2003.
3.D. D. Lee and H. S. Seung, “Learning the Parts of Objects by Non-Negative Matrix Factorization,” Nature, vol. 401, pp. 788–791, 1999.
4.C. Wu, B. Ma, Z. Zhang, N. Deng, Y. He, and Y. Xue, “Evaluating Zero-Shot Multilingual Aspect-Based Sentiment Analysis with Large Language Models,” arXiv preprint arXiv:2412.12564, 2024.
5.P. F. Simmering, R. Werkmeister, and L. Di Stasio, “Large Language Models for Aspect-Based Sentiment Analysis,” arXiv preprint arXiv:2310.18025, 2023.
6.M. Água, P. Pina, and B. Ribeiro, “Large Language Models Powered Aspect-Based Sentiment Analysis for Enhanced Customer Insights,” Tourism and Management Studies, vol. 21, 2025.
7.J. Šmíd, M. Bělohlávek, and T. Brychcín, “LLaMA-Based Models for Aspect-Based Sentiment Analysis,” Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 66–78, 2024.

Download PDF

Cite This Article

Ersoy, P., Erşahin, M. (2025). An Empirical Comparison of Claude, Llama, and Gemini for Aspect-Level Sentiment. *The European Journal of Research and Development*, 5(1), 149-163. https://doi.org/10.56038/ejrnd.v5i1.659

Bibliographic Info

JournalThe European Journal of Research and Development

Volume5

Issue1

Pages149–163

PublishedNovember 23, 2025

eISSN2822-2296