GeoAI Insights Dashboard

Welcome to the GeoAI Insights Dashboard. This platform, developed as part of a GIS program capstone project at COGS NSCC, offers a comprehensive suite of tools to evaluate and compare the performance of various Large Language Models (LLMs). The primary focus is on the application of LLMs in geospatial analysis and the ability of LLMs to assist users in performing complex database queries using natural language. Explore detailed metrics, visualizations, and rankings to understand model strengths, weaknesses, and overall suitability for specific GeoAI tasks. Navigate through sections dedicated to core performance, specialized SQL evaluations, and comparative model rankings to gain actionable insights.

Section 1: Core Performance Metrics

Structured Response Reliability

Evaluates the model's ability to follow instructions and return a well-formed JSON object as requested. Success in this context means the model provided a proper JSON response, indicating reliability in structured data output.

Geospatial Layer Picking Accuracy

Analyze the accuracy of models in selecting the correct geospatial layers for given tasks. This summary shows correctness rates, including how often models chose the right layers, missed required layers, or selected extra, unnecessary layers.

Combined Performance Overview

Get a combined overview of model performance, integrating both the general return status and the layer selection correctness. This provides a holistic view of model capability across different evaluation criteria.

Section 2: Specialized Evaluations

SQL Generation & Execution

Review the performance of models in generating and executing SQL queries. This section details success rates, errors, and consistency of SQL generation across different test runs.

Section 3: Model Rankings

Core Performance Rankings

See how models rank based on overall success rates or specific correctness metrics for general tasks. Helps identify top-performing models.

Specialized Evaluations Rankings

Explore model rankings based on their performance in SQL generation and evaluation tasks. Focuses specifically on SQL-related correctness and efficiency.