O
Orclever
Back to Journal
Research Article Open AccessOrclever Native

An Empirical Comparison of Claude, Llama, and Gemini for Aspect-Level Sentiment

Pınar Ersoy1,
Mustafa Erşahin2
1Commencis Teknoloji
2Commencis Teknoloji
Published:November 23, 2025

Abstract

Aspect-based sentiment analysis provides granular insights into customer feedback by identifying discrete aspects, such as features or topics, and assigning a corresponding sentiment to each. This study assesses three large language models, hereafter referred to as LLMs, namely Google Gemini 2.5 Flash-Lite, Anthropic Claude Sonnet-4 delivered through AWS Bedrock, and Meta LLaMA 3.3 70B delivered through AWS Bedrock, using a real-world multilingual corpus of 7,841 Turkish mobile banking app reviews from İşbank in Turkey. We employ a prompt-based tagging protocol to extract aspect–sentiment pairs from every review, and we compare accuracy, F1-score, inference cost, and latency. The results show that all three LLMs can execute multilingual aspect extraction and sentiment categorization without task-specific fine-tuning. Claude Sonnet-4 attains the highest F1 for aspect extraction and the highest sentiment accuracy, although it incurs a markedly higher inference cost. Gemini 2.5 Flash-Lite achieves competitive accuracy at a fraction of the price, making it well-suited for high-volume analytics. Meta LLaMA at the 70B scale accessed through AWS Bedrock exhibits intermediate performance with moderate cost and latency. We provide detailed performance tables and figures, along with best-practice guidance for enterprise deployment. AWS Bedrock enables the strategic selection of Claude and LLaMA 3.3 70B for multilingual sentiment analysis, offering valuable insights from app reviews within scale, accuracy, and budget constraints.

Keywords
Natural Language ProcessingSentiment AnalysisGenerative AILarge Language Models

References

  1. 1.M. Pontiki, D. Galanis, J. Pavlopoulos, H. Papageorgiou, I. Androutsopoulos, and S. Manandhar, “SemEval-2014 Task 4: Aspect Based Sentiment Analysis,” Proceedings of the 8th International Workshop on Semantic Evaluation, pp. 27–35, 2014.
  2. 2.D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet Allocation,” Journal of Machine Learning Research, vol. 3, pp. 993–1022, 2003.
  3. 3.D. D. Lee and H. S. Seung, “Learning the Parts of Objects by Non-Negative Matrix Factorization,” Nature, vol. 401, pp. 788–791, 1999.
  4. 4.C. Wu, B. Ma, Z. Zhang, N. Deng, Y. He, and Y. Xue, “Evaluating Zero-Shot Multilingual Aspect-Based Sentiment Analysis with Large Language Models,” arXiv preprint arXiv:2412.12564, 2024.
  5. 5.P. F. Simmering, R. Werkmeister, and L. Di Stasio, “Large Language Models for Aspect-Based Sentiment Analysis,” arXiv preprint arXiv:2310.18025, 2023.
  6. 6.M. Água, P. Pina, and B. Ribeiro, “Large Language Models Powered Aspect-Based Sentiment Analysis for Enhanced Customer Insights,” Tourism and Management Studies, vol. 21, 2025.
  7. 7.J. Šmíd, M. Bělohlávek, and T. Brychcín, “LLaMA-Based Models for Aspect-Based Sentiment Analysis,” Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 66–78, 2024.
Download PDF
Cite This Article
Ersoy, P., Erşahin, M. (2025). An Empirical Comparison of Claude, Llama, and Gemini for Aspect-Level Sentiment. *The European Journal of Research and Development*, 5(1), 149-163. https://doi.org/10.56038/ejrnd.v5i1.659

Bibliographic Info

JournalThe European Journal of Research and Development
Volume5
Issue1
Pages149–163
PublishedNovember 23, 2025
eISSN2822-2296