๐ŸŽ™๏ธ ACL-25 SpeechIQ Leaderboard

๐ŸŽฏ Welcome to the SpeechIQ Leaderboard!

This leaderboard presents evaluation results for voice understanding large language models (LLMVoice) using our novel SpeechIQ evaluation framework. The Speech IQ Score provides a unified metric for comparing both cascaded methods (ASR+LLM) and end-to-end models.

๐Ÿฅ‡
108.64
-1.885
-1.604
-1.146
Agentic: ASR + GER + LLM
Whisper_v2-1.5B + GPT-4o + Qwen2_7B
OWSM-CTC_v3.1-1B

๐Ÿ“‹ Column Explanations

  • Rank: Position ranking with ๐Ÿฅ‡๐Ÿฅˆ๐Ÿฅ‰ medals for top 3 performers
  • Speech IQ: Overall intelligence quotient combining all dimensions (primary metric)
  • Remember: Verbatim accuracy score (WER-based)
  • Understand: Semantic interpretation similarity score
  • Apply: Downstream task performance score
  • Model Type: Architecture approach (Agentic vs End2End)
  • Setup: Specific model configuration and components
  • Audio Encoder: The audio processing component used

Higher scores indicate better performance across all metrics.

๐Ÿ“Š Leaderboard Statistics

Metric Value
๐Ÿ† Top Performer Whisper_v2-1.5B + GPT-4o + Qwen2_7B
๐ŸŽฏ Highest Score 108.64
๐Ÿค– Best Agentic Model 108.64
๐Ÿ”„ Best End2End Model 107.85
๐Ÿ“ˆ Total Models 13