arXiv cs.AI INT ai 2026-04-28 13:00

大規模音声言語モデルの包括的評価に向けて: 総合的な調査

原題: Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey

分析結果

カテゴリ: AI
重要度: 69
トレンドスコア: 28
要約: 大規模音声言語モデル（LALMs）の進展により、聴覚機能を備えた大規模言語モデル（LLMs）が期待されています。本稿では、これらのモデルの評価方法について探求します。
キーワード: models auditory large language survey advancements taxonomy evaluations

arXiv:2505.15957v4 Announce Type: replace-cross Abstract: With advancements in large audio-language models (LALMs), which enhance large language models (LLMs) with auditory capabilities, these models are expected to demonstrate universal proficiency across various auditory tasks. While numerous benchmarks have emerged to assess LALMs' performance, they remain fragmented and lack a structured taxonomy. To bridge this gap, we conduct a comprehensive survey and propose a systematic taxonomy for LALM evaluations, categorizing them into four dimensions based on their objectives: (1) General Auditory Awareness and Processing, (2) Knowledge and Reasoning, (3) Dialogue-oriented Ability, and (4) Fairness, Safety, and Trustworthiness. We provide detailed overviews within each category and highlight challenges in this field, offering insights into promising future directions. To the best of our knowledge, this is the first survey specifically focused on the evaluations of LALMs, providing clear guidelines for the community. We will release the collection of the surveyed papers and actively maintain it to support ongoing advancements in the field. arXiv:2505.15957v4 Announce Type: replace-cross Abstract: With advancements in large audio-language models (LALMs), which enhance large language models (LLMs) with auditory capabilities, these models are expected to demonstrate universal proficiency across various auditory tasks. While numerous benchmarks have emerged to assess LALMs' performance, they remain fragmented and lack a structured taxonomy. To bridge this gap, we conduct a comprehensive survey and propose a systematic taxonomy for LALM evaluations, categorizing them into four dimensions based on their objectives: (1) General Auditory Awareness and Processing, (2) Knowledge and Reasoning, (3) Dialogue-oriented Ability, and (4) Fairness, Safety, and Trustworthiness. We provide detailed overviews within each category and highlight challenges in this field, offering insights into promising future directions. To the best of our knowledge, this is the first survey specifically focused on the evaluations of LALMs, providing clear guidelines for the community. We will release the collection of the surveyed papers and actively maintain it to support ongoing advancements in the field.

大規模音声言語モデルの包括的評価に向けて: 総合的な調査

分析結果

類似記事（ベクトル近傍）