Case Study: Building a Digital Data Analyst in 8 Weeks

Executive Summary

Spillwave successfully designed and deployed a Digital Data Analyst solution for a financial services client in just 8 weeks. The system transforms natural language questions into DAX (Data Analysis Expressions) queries that extract data from Microsoft's SSAS Analysis Services cubes, enabling financial analysts to access critical business insights without specialized technical expertise. Despite encountering significant challenges with the complex DAX query language—which lacks substantial representation in foundation model training data—the team delivered a production-ready application achieving a 70-80% query success rate by project conclusion.

Watch the conversation on Youtube

Client Challenge

The client required a solution that would allow their financial analysts to:

Bypass the steep learning curve of DAX query syntax
Eliminate the need for custom BI report creation for each new business question
Reduce dependency on technical resources and their associated lead times
Enable self-service data exploration through natural language

The objective was to create an intuitive conversational interface capable of interpreting natural language questions about financial data, converting them into syntactically valid DAX queries, and returning accurate, properly formatted results from their existing SSAS cube.

Technical Environment

Data Source: Microsoft SSAS Analysis Services cube containing hierarchical financial data
Query Language: DAX (Data Analysis Expressions) - Microsoft's specialized analytical query language with complex time intelligence functions and dimensional modeling concepts
Backend Stack: Python 3.11, FastAPI, Azure OpenAI integration (GPT-4), pydantic data validation
Frontend Implementation: React with Emotion/styled components, Tailwind CSS, and a debug panel for query examination
Vector Database: Custom pgVector implementation for embedding storage and semantic search with hybrid query capabilities
Deployment Target: Azure cloud infrastructure with Function Apps and Web Apps
Authentication: Azure AD integration with JWT token validation and SSAS effective username impersonation

The 8-Week Journey

Week 1-2: Discovery and Foundation Building

The team initially underestimated the challenge based on their previous success with natural language to SQL conversions. They soon discovered multiple critical obstacles:

DAX syntax presents significantly higher complexity than SQL, with unintuitive semantics around filter contexts, time intelligence, and table relationships
Foundation language models demonstrated limited knowledge of DAX constructs due to sparse representation in training data
The dimensional nature of SSAS cubes required specialized context about measures vs. dimensions to generate meaningful queries
Early prototypes struggled to generate even syntactically valid DAX, highlighting the need for a more robust approach

As engineer Rick noted: "Chris had created a prototype that worked with SQL and it was beautiful. It just worked. It was amazing. And then we tried to use the same techniques with DAX and it was like... it wouldn't even run."

Week 3-4: Building the RAG Infrastructure

The team developed a sophisticated Retrieval-Augmented Generation (RAG) system to compensate for the LLMs' lack of DAX knowledge:

Vector Database Architecture:
- Implemented a three-tiered content representation schema with projects, files, and chunks for flexible organization
- Built semantic search capabilities using cosine similarity on text embeddings
- Developed metadata filtering for hybrid search to constrain results by project, type, and other attributes
- Created an optimized PgVector schema with proper indexing for fast similarity searches
Multi-Source Data Ingestion Pipeline:
- Extracted complete Data Definition Language (DDL) from the SSAS cube including tables, measures, and dimensions
- Collected curated "ground truth" question-query pairs from client examples (20+ pairs)
- Analyzed 7,000+ PowerBI query logs, identifying 122 in-scope queries for additional examples
- Implemented automatic business context generation for cube objects using LLMs
- Built a "hypothetical answer" generator to create natural language descriptions from DAX queries
Advanced Prompt Engineering:
- Developed specialized prompts with detailed DAX-specific syntax rules and best practices
- Created contextual few-shot examples based on similarity to the current query
- Designed workflow-based prompt templates with clear instruction sequences
- Implemented prompt components for error correction and disambiguation

Week 5-6: Query Generation Pipeline Development

The team initially explored a fully agentic approach using the Pydantic AI framework with specialized agents for preprocessing, query construction, and verification. However, they faced challenges with the agent's decision-making process, as Chris explained:

"My version of it would just spin and spin. It would just spend all its time in the first couple of tools and never really decide that it was good to go."

While optimistic on the capabilities of an agentic approach for future versions such as a fully featured "analyst" mode, and in particular the 'critic' aspect, the team decided to simplify for this short timeframe.

This led to a strategic pivot toward a more deterministic workflow approach:

Workflow System Architecture:
- Implemented a structured workflow for end-to-end query handling
- Created a multi-stage pipeline for query analysis, RAG retrieval, generation, validation, and execution
- Developed fallback mechanisms with degradation paths for when primary queries failed
- Added conversation history tracking to maintain context across interactions
Multi-Layer Validation System:
- Built an LALR(1) grammar-based parser for fast deterministic DAX syntax validation
- Implemented semantic validation using LLMs to check query-intent alignment
- Created execution validation to verify queries against the actual data model
- Developed error categorization and recovery mechanisms for different failure types
Response Generation and Formatting:
- Created specialized formatting utilities for financial data types (currency, percentages, dates)
- Implemented an intelligent schema-aware formatter that applied appropriate formatting based on column semantics
- Added clarification mechanisms for ambiguous questions with suggested alternatives
- Built a debug interface for examining the complete query generation process

Week 7-8: Integration and Refinement

The final weeks saw accelerated progress as the team integrated all components and focused on improving query accuracy:

End-to-End Testing and Integration:
- Connected to test cube environment through a C# Function App using ADOMD.NET
- Implemented row-level security through effective username impersonation
- Validated query generation against 50+ test scenarios
- Built error handling and recovery for network, authentication, and execution issues
Performance and Accuracy Optimization:
- Fine-tuned similarity thresholds for RAG retrieval based on empirical testing
- Optimized prompt construction to maximize relevant examples within token limits
- Implemented query categorization to apply specialized handling for different query types
- Created a query success tracking system for continuous improvement
User Experience Finalization:
- Completed the React-based chat interface with response formatting
- Added an advanced debugging panel for developers to inspect query generation
- Implemented comprehensive query logging for future analysis and fine-tuning
- Created demonstration scripts showcasing various query capabilities

Technical Solution Architecture

Vector RAG Implementation

The team built a sophisticated Vector RAG system that stored and retrieved three key data types:

DDL Contextualized: Complete cube metadata enriched with business context
- Tables, measures, dimensions, and relationships
- Business descriptions and usage patterns
- Data types and formatting requirements
Ground Truth Questions: Client-provided examples with corresponding DAX queries
- Natural language questions
- Correct DAX implementations
- Tables and columns referenced
- Business context explanations
Query Logs: Historical PowerBI queries with extracted metadata
- DAX queries with execution statistics
- Tables and measures utilized
- Natural language descriptions (AI-generated through reverse engineering)

This multi-source approach provided the LLM with:

Precise schema information with business meaning
Relevant query examples for few-shot learning
Domain-specific financial terminology context
Query patterns proven to work against the target cube

Workflow-Based Query Processing Pipeline

Rather than using a fully agentic approach, the team implemented a deterministic workflow with the following components:

Query Analysis and Classification:
- Determines if the question is casual or a genuine data question (called "the pithy question detector")
- Identifies key entities, metrics, and time frames
- Categorizes the question type (e.g., aggregation, comparison, trend)
- Handles conversation context from previous interactions
Context Retrieval and Preparation:
- Performs semantic search for relevant metadata with configurable thresholds
- Retrieves similar query examples from ground truth and logs
- Merges and ranks retrieved context based on relevance
Multi-Candidate Query Generation:
- Generates multiple candidate DAX queries with variations
- Includes parsing and formatting for proper structure
- Applies DAX best practices from prompt guidelines
- Creates query explanations for user understanding
Multi-Stage Validation:
- Syntax validation using a custom LALR(1) parser
- Semantic validation checking query-intent alignment
- Execution validation against the live cube
- Error analysis and categorization for recovery
Results Processing and Formatting:
- Executes validated queries against the SSAS cube
- Processes and formats returned data appropriately
- Applies business-friendly formatting based on data types
- Generates explanations of the results in natural language
Conversation Management:
- Maintains conversation history for context
- Handles follow-up questions with reference to previous queries
- Suggests related questions for further exploration
- Logs interactions for future training and improvement

LLM Integration and Prompt Engineering

The system relied on sophisticated prompt engineering techniques to compensate for the LLMs' lack of DAX knowledge:

Structured Few-Shot Examples:
- Dynamically selected based on query similarity
- Included both the natural language question and corresponding DAX
- Demonstrated proper handling of specific financial concepts
- Showcased correct syntax patterns for complex operations
DAX-Specific Rules and Guidelines:
- Detailed syntax requirements and best practices
- Common pitfalls and how to avoid them
- Performance considerations for query execution
- Special handling for time intelligence functions
Contextual Schema Information:
- Relevant tables, measures, and dimensions for the current query
- Business descriptions of what each element represents
- Relationships between tables and hierarchies
- Data types and formatting requirements
Progressive Error Correction:
- Detailed error messages from validation steps
- Suggestions for fixing common DAX mistakes
- Examples of correct patterns to follow
- Guidance on alternative approaches when needed

Results and Value Delivered

In just 8 weeks, the team delivered a functioning Digital Data Analyst with impressive capabilities:

Understanding complex natural language financial questions with contextual awareness
Generating valid DAX queries with a 70-80% success rate across diverse query types
Providing business-friendly explanations of query results and their meaning
Learning from past interactions to continuously improve accuracy
Supporting follow-up questions and conversational exploration

The client realized immediate business value through:

Significant Time Savings: Financial analysts could instantly retrieve data that previously required custom report creation, reducing wait times from days to seconds
Knowledge Democratization: Financial data became accessible to team members without DAX expertise, expanding analytical capabilities
Exploratory Data Analysis: Analysts gained the ability to iteratively explore data through natural conversation, leading to deeper insights
Technical Debt Reduction: The organization dramatically reduced the need for custom reports and dashboards for one-off questions
Improved Decision Making: Faster access to data enabled more timely and informed business decisions

Key Insights and Learnings

1. Client Data Context is Critical but Challenging to Obtain

The team discovered that gathering business-specific metadata was essential but difficult to properly scope and obtain. They developed an innovative approach where:

Initial metadata descriptions were generated using LLMs
Client reviewed and refined these descriptions for accuracy
The most critical entities received priority attention

However, as Rick explained: "If you ask the client here's 400 fields please go type the business purpose value and intent for each of these fields it's a hard sell even though it's their project and that's what they want."

The team concluded that future projects should include a more structured, phased approach to metadata collection, with clearer guidance on what makes metadata most valuable to the AI system.

2. Query Examples Provide the Highest ROI for RAG Effectiveness

While schema metadata was important, the team found that actual query examples delivered the greatest improvement in system performance:

"It seemed like the most bang for the buck that we got out of all of this was, here's what a good query looks like that does the type of thing that's being asked."

The combination of:

Natural language question
Corresponding DAX query
Tables and columns used
Business context

provided the LLM with concrete patterns to follow, dramatically improving query generation accuracy compared to abstract rules and guidelines alone.

3. Development Progress Exhibited Non-Linear Improvement Patterns

Progress was initially slow while building the foundation, followed by rapid improvements in later weeks. The team observed that performance improvements often came in significant jumps rather than gradual increments, particularly after adding certain types of examples or refining prompt components.

When the quality jumped, it seemed to drastically jump. This non-linear improvement pattern underscores the importance of persistence through initial development phases that may show limited visible progress.

4. Domain-Specific Tasks Require Specialized Knowledge Augmentation

The team's experience highlighted a crucial insight about current LLMs: even the most advanced models have significant knowledge gaps in specialized domains like DAX:

"The foundation models don't know much about DAX because nobody's publishing stuff about DAX on the web."

This reinforced the value of their RAG approach, which effectively augmented the model's knowledge with domain-specific information. The application's design includes comprehensive logging of all question/prompt/DAX response interactions, with plans to add user feedback mechanisms, creating a valuable dataset for future fine-tuning.

Future Improvement Opportunities

Based on the team's analysis, four promising improvement paths have been identified:

1. Fine-Tuning on Domain-Specific Dataset

Creating a specialized model through fine-tuning on DAX examples could provide substantial accuracy improvements:

Technical Approach:
- Build a comprehensive corpus of English-to-DAX query pairs (500+ examples)
- Select a base model already strong in SQL-like reasoning (e.g., GPT-4o or Claude)
- Implement fine-tuning with careful validation to prevent overfitting
- Include diverse query patterns representing various financial analysis scenarios
Expected Benefits:
- 15-20% improvement in query success rate
- Reduced token usage through more efficient prompting
- Better handling of DAX-specific syntax and functions
- Lower latency due to reduced need for extensive context

2. Graph-Based RAG Implementation

Evolving the pgVector approach to a knowledge graph-based retrieval system would better represent complex relationships:

Technical Approach:
- Build a knowledge graph encoding cube schema relationships
- Add business logic and semantic connections between entities
- Implement validation rules for DAX function usage
- Develop specialized graph traversal for context retrieval
Expected Benefits:
- More precise context retrieval for complex queries
- Better modeling of table relationships
- Improved understanding of valid DAX operations

3. Test-Time Compute Adjustment System

Implementing a comprehensive feedback loop would enable continuous system improvement:

Technical Approach:
- Create a monitoring system for query success/failure patterns
- Capture explicit user feedback on result quality
- Implement automated regression testing for new examples
- Develop a RLHF pipeline for model refinement
Expected Benefits:
- System that improves automatically through usage
- Adaptation to changing business terminology
- Increasing accuracy for frequently used query patterns
- Data collection for further fine-tuning iterations

4. Financial Analyst Activity Classification

Developing specialized workflows for different financial analysis tasks would enhance domain-specific performance:

Technical Approach:
- Identify core financial analysis activities (e.g., variance analysis, trend analysis)
- Create specialized context and examples for each activity
- Implement activity-specific query templates
- Build a classifier to route questions to appropriate handlers
Expected Benefits:
- More precise handling of domain-specific questions
- Better alignment with financial analysis methodologies
- Improved response quality for complex analytical tasks
- More natural follow-up question handling

Conclusion

The Spillwave team demonstrated that a high-value, AI-powered business solution could be built in just 8 weeks, even when facing significant technical challenges. By combining RAG techniques with carefully designed workflows and advanced prompt engineering, they created a Digital Data Analyst that creates a novel method of data exploration business users can interact with for complex financial data. Another "tool in the arsenal" for these users.

This project exemplifies Business Process Insourcing at its best—taking a specialized technical task (DAX query writing) and making it accessible to business users through AI. The iterative development approach, strategic pivots when needed, and focus on delivering immediate business value enabled the team to achieve remarkable results in a constrained timeframe.

The system's architecture balances immediate utility with extensibility, providing a foundation for continuous improvement that will deliver increasing value over time. By capturing interaction data and user feedback, the solution is positioned to become even more accurate and capable through subsequent refinement cycles.