Geospatial Layer Picking Accuracy
Selected Layer vs Ground Truth Summary
- Total Runs: 4
- Total Queries: 28
- Total Unique Models: 25
All Models vs Queries (Layer Comparison)
| Query | deepseek-chat-v3-0324 | deepseek-prover-v2 | deepseek-r1 | gemini-2.0-flash-001 | gemini-2.5-flash-preview | gemini-2.5-pro-preview-03-25 | gemma-3-12b-it | gemma-3-27b-it | llama-4-maverick | llama-4-scout | phi-4-reasoning-plus | phi-4-reasoning | llama-3.1-nemotron-ultra-253b-v1 | llama-3.3-nemotron-super-49b-v1 | gpt-4.1 | gpt-4.1-mini | gpt-4.1-nano | gpt-4o-mini | o4-mini | o4-mini-high | qwen3-14b | qwen3-235b-a22b | qwen3-30b-a3b | qwen3-32b | qwen3-8b |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Number of litter bins within Sambro, Halifax | ✔ | ✘ | ✘ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✘ | ✘ | ✔ | ✘ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
| Number of litter bin and boat facilities within Sambro, Halifax | ✘ | ✔ | ✘ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✘ | ✘ | ✔ | ✘ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
| World's most populous city | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✘ | ✘ | ✔ | ✘ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
| Total number of communities in Halifax Regional Municipality | ✔ | ✔ | ✔ | ✔ | ✘ | ✔ | ✔ | ✔ | ✔ | ✔ | ✘ | ✘ | ✔ | ✘ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
| Number of boat facilities in Halifax | ✘ | ✘ | ✘ | ✔ | ✔ | ✔ | ✘ | ✔ | ✔ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✔ | ✔ | ✘ | ✔ | ✔ | ✔ | ✔ | ✘ |
| All the available datasets in Halifax | ✘ | ✘ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✘ | ✔ | ✘ | ✘ | ✔ | ✘ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
| All the available datasets | ✘ | ✔ | ✘ | ✔ | ✘ | ✔ | ✔ | ✔ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✔ | ✘ | ✘ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
Combined Layer Comparison Table (All Runs)
| Query | deepseek-chat-v3-0324 | deepseek-prover-v2 | deepseek-r1 | gemini-2.0-flash-001 | gemini-2.5-flash-preview | gemini-2.5-pro-preview-03-25 | gemma-3-12b-it | gemma-3-27b-it | llama-4-maverick | llama-4-scout | phi-4-reasoning-plus | phi-4-reasoning | llama-3.1-nemotron-ultra-253b-v1 | llama-3.3-nemotron-super-49b-v1 | gpt-4.1 | gpt-4.1-mini | gpt-4.1-nano | gpt-4o-mini | o4-mini | o4-mini-high | qwen3-14b | qwen3-235b-a22b | qwen3-30b-a3b | qwen3-32b | qwen3-8b |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| All the available datasets | ○○○○ | ✔○✔✔ | ✔✔✔✘ | ✔✔✔✔ | ✔✔✘✘ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔○○○ | ○○○○ | ○○○○ | ○○○○ | ✔✔✔○ | ○○○○ | ✔✔✔✘ | ✔✔✔✔ | ✘○✘✘ | ✘✘✘✘ | ✔✔✔✔ | ○✔✔✔ | ✔✔✔✔ | ○✔○✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ |
| All the available datasets in Halifax | ○○✔○ | ✔○✔○ | ✔✘✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔○✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔○○ | ✔○○✔ | ○○○○ | ○○○○ | ✔✔✔✔ | ○✘○○ | ✔✔✔✔ | ✔✔✔✔ | ○✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ |
| Number of boat facilities in Halifax | ✔○○○ | ○✘✘✘ | ✘✘✘✘ | ✔✔✔✔ | ✔✔✔✔ | ✘✘✘✔ | ✘✘✘✘ | ✔✔✔✔ | ✔✔✔✔ | ✘✘✘✘ | ○○○○ | ○○○○ | ✔✘✘✘ | ○○○○ | ✘✘✘✘ | ✘✘✘✘ | ✘✘✘✘ | ✔✔✔✔ | ✔✔✔✔ | ✔✘✘✘ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✘✔✔✔ | ✔✔✔○ |
| Number of litter bin and boat facilities within Sambro, Halifax | ✔○○○ | ✔✔✔✔ | ✔✔✘✘ | ✔✔✔✔ | ○✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔○✔✔ | ✔✔✔✔ | ○○○○ | ○○○○ | ✔✔✔✔ | ○○○○ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ |
| Number of litter bins within Sambro, Halifax | ○○✔✔ | ✘✔✔○ | ✔✔✔✘ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ○○○○ | ○○○○ | ✔✔✔✔ | ○○○○ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ |
| Total number of communities in Halifax Regional Municipality | ○✔○✔ | ✔✔✔✔ | ✘✔✔✔ | ✔✔✔✔ | ✔✔○○ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ○○○○ | ○○○○ | ✔✔✔✔ | ○○○○ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ |
| World's most populous city | ○○✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ○○○✔ | ✔✔✔✔ | ○○○○ | ○○○○ | ✔✔✔✔ | ○○○○ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ | ✔✔✔✔ |
Each ✔ indicates a correct layer selection for that model/query in a run.
Each ✘ indicates an incorrect layer selection.
◯ means no data for that run.