Merged Summary: Return Status vs Layer Comparison

Detailed Comparison for llm_comparison_results_20250503_153935_run4.json

Querydeepseek-chat-v3-0324deepseek-prover-v2deepseek-r1gemini-2.0-flash-001gemini-2.5-flash-previewgemini-2.5-pro-preview-03-25gemma-3-12b-itgemma-3-27b-itllama-4-maverickllama-4-scoutphi-4-reasoning-plusphi-4-reasoningllama-3.1-nemotron-ultra-253b-v1llama-3.3-nemotron-super-49b-v1gpt-4.1gpt-4.1-minigpt-4.1-nanogpt-4o-minio4-minio4-mini-highqwen3-14bqwen3-235b-a22bqwen3-30b-a3bqwen3-32bqwen3-8b
Query 1✘✘✔✔✔✘✔✔✔✘✔✔✔✔✔✔✘✘✘✘✘✘✘✘✔✘✘✘✔✘✔✔✔✘✔✘✔✔✔✔✔✔✔✔✔✔✔✔✔✔
Query 2✘✘✘✘✔✔✔✔✔✔✔✔✔✔✔✔✘✘✔✔✘✘✘✘✔✔✘✘✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔
Query 3✘✘✔✘✔✘✔✔✔✔✔✔✔✘✔✔✔✔✔✘✘✘✘✘✔✘✘✘✔✘✔✘✔✘✔✔✔✔✔✘✔✔✔✔✔✔✔✔✘✘
Query 4✘✘✔✔✔✘✔✔✔✔✔✔✔✔✔✔✔✔✔✔✘✘✘✘✔✔✘✘✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔
Query 5✔✔✘✘✔✘✔✔✔✔✔✔✔✔✔✔✔✔✔✔✘✘✘✘✔✔✘✘✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔
Query 6✔✔✔✔✔✔✔✔✘✘✔✔✔✔✔✔✔✔✔✔✘✘✘✘✔✔✘✘✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔
Query 7✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✘✘✘✘✔✔✘✘✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔✔
✔✔ Success & Correct   ✔✘ Success but Layers Incorrect/Missing   ✘✘ Error (Layer Correctness N/A)   - Data Not Available

Per-Run Merged Performance Pies

Run 1

Run 2

Run 3

Run 4