How we hit the top of the BIRD-SQL benchmark: fine-tuning and diverse RAG

April 22, 2024

We're excited to announce the release of our paper describing the inner workings of Dubo-SQL on the arXiv. The paper describes both our v1 model, which hit the top of the BIRD-SQL leaderboard in November 2023, and our v2 model that does even better.

BIRD-SQL is the most realistic text-to-SQL benchmark, with 12,751 real user questions from 95 databases across dozens of industries. The questions are challenging, with human accuracy of 93%.

Dubo-SQL v1 hit #1 on the BIRD-SQL leaderboard, ahead of models from Tencent and Alibaba. We used a simple prompt with a GPT-3.5 Turbo model fine tuned on the training set. For any SQL that resulted in a syntax error, we sent the error messages back to the LLM for correction. The model is token-efficient and fast, with an inference cost 50x cheaper than reported for models with lower accuracy.

Dubo V1 benchmark

Dubo-SQL v2 achieves higher performance on the BIRD-SQL dev set using GPT-4 Turbo with a novel approach to retrieval-augmented generation. We use cosine similarity to match each user question from the dev set with similar questions from the training set, but we force the example questions to be diverse, at the cost of lower similarity to the dev set question.

Dubo V2 benchmark

Check out the paper and our GitHub repo for more details and let us know what you think! The models we share here are greatly simplified versions of the Dubo production code. The production version of Dubo can handle databases and documentation that are orders of magnitude larger than in BIRD and can have full conversations instead of one round of Q&A.

Announcing the Dubo SQL Editor

March 25, 2024

Data teams hate getting distracted from their strategic goals by ad hoc requests and their business partners hate waiting on them for answers, or at least that was our experience in big tech. To help data scientists and their business partners alike, we’ve built Dubo, an AI powered SQL editor. Dubo can be:

A copilot to help you write your SQL faster.

Dubo Copilot

Or a chatbot to write SQL for you.

Dubo Chatbot

Onboarding Dubo is easy: 1. Connect your database. For teams with extra security concerns, share a text file of your db schema instead of a database connection. 2. Upload database documentation. 3. Write custom instructions describing your business. 4. Tell Dubo when it makes a mistake. It takes notes to learn from your feedback.

Why isn’t this just another ChatGPT wrapper? Dubo set a record for text-to-SQL performance on the BIRD-SQL benchmark and remains the top-performing commercial model. The core model is enabled by a multi-step pipeline for writing the SQL that answers your question: 1. Select tables and extract documentation to answer your question. 2. Write draft SQL. 3. Block any hallucinated columns or tables. 4. AI auto-corrections for syntax errors on execution.

Try it out and let us know what you think!

Dubo Is Now SOC 2 Compliant

February 08, 2024

The Dubo platform from Mercator Technologies has received a clean SOC 2 attestation from Advantage Partners with automated evidence collection from our fellow YC company Vanta. This audit verifies that we adhere to the highest industry standard for security.

SOC 2 or Service Organization Controls 2 is a framework that is governed by the American Institute of Certified Public Accountants (AICPA). With a SOC 2 audit, an independent service auditor will review an organization’s policies, procedures, and evidence to determine if their controls are designed and operating effectively. A SOC 2 report communicates a company’s commitment to data security and protection of customer information.

If your company could benefit from having Dubo as an AI data analyst, either for internal use or embedded in your product, get in touch with us at founders@dubo.gg.

SOC

Launching the Dubo GPT

January 05, 2024

ChatGPT lets anyone ask data questions without being an expert in skills like Python or SQL. It still has a few shortcomings as a data analytics assistant:

  • Lack of context. An infinitely skilled AI wouldn’t be able to answer questions about your business if it lacked access to proprietary information, like how you define your metrics.
  • Data limitations. Sharing your data with ChatGPT is impractical if you have a corporate database with thousands of tables, since you’ll quickly exceed file size and context limits.
  • Limited SQL capabilities. By the best benchmarks, GPT-4 is well below human performance at writing SQL.

Dubo is designed to fix these shortcomings. Our new GPT brings Dubo to ChatGPT to combine Dubo’s skills as a data scientist with ChatGPT’s skills as a generalist. Ask a question and Dubo will write and execute the SQL and share the results. This has a few advantages over vanilla ChatGPT:

  • Database documentation. Dubo uses database documentation like your dbt metrics to give the AI context necessary to answer questions about your business.
  • Live database connections. You’re no longer limited to loading small CSVs. Dubo can select the right set of tables to answer questions about large corporate databases.
  • State-of-the-art SQL skills. Dubo is the best LLM-based tool for writing SQL, significantly better than the GPT-4 baseline.

The Dubo GPT lets you combine Dubo’s strength in writing SQL with ChatGPT’s native ability to make visualizations with Python.

Dubo + ChatGPT

Try it out by signing up for Dubo and chatting with the Dubo GPT.

Dubo ranks number 1 in accuracy on BIRD

November 27, 2023

We’re the #1 most accurate text-to-SQL model on the BIRD benchmarks.

Progress with BIRD

Early on, we noticed that many text-to-SQL benchmarks didn’t represent a real-world business environment. Our jobs as data scientists often included messy data, convoluted multi-part JOINs, and confusing WHERE clauses that filter nested JSON. Some well-known benchmarks mostly concerned a single table or small sets of JOINs. They lacked "ecological validity" – the evaluation failed to map to its real-world context.

The BIRD benchmarks are designed to be more representative of real-world SQL. Developed by researchers from University of Hong Kong, Tsinghua University, MIT CSAIL, University of Illinois at Urbana-Champaign, and elsewhere, BIRD contains phenomenona like large tables with many values and considers data across industries from entertainment to healthcare. We think it is a much better representation of corporate databases.

We are pleased to have achieved this result, and thank the BIRD team for running the benchmark.

Meet the Dubo Slackbot

November 16, 2023

The biggest challenge with LLM-powered text-to-SQL tools is that they don’t know your business, its goals, or how it defines its metrics. Even if GPT-5 were released tomorrow and wrote perfect SQL, it wouldn’t be able to answer questions about your business without that context.

We’re solving that problem with Dubo, which learns about your business like any other data analyst who joins your team. Specifically, Dubo does the following:

  1. Scans your database schemas and query history. The database layout and previous queries provide rich information about data access patterns, e.g., common JOINs.
  2. Reads data documentation. We let users provide their data documentation to the LLM for retrieval augmented generation.
  3. Takes direct feedback. When you correct Dubo’s mistakes, it takes notes so you only have to tell it once.

Today, we’re launching Dubo as a Slackbot. Anyone on your team can tag Dubo to join a conversation as a data analyst. Dubo will ask clarifying questions if it’s not sure and take your feedback, both positive and critical.

To try it out, first sign up at dubo.gg and then add it to your Slack workspace here. Use it with a rate limit at no cost. Contact us at founders@dubo.gg if you’d like a higher limit or just to let us know what you think of it.

Preview

Introducing the Dubo Chrome Extension

October 24, 2023

We've spent a lot of time writing Snowflake queries. The Postgres-flavored SQL syntax, the ease of setup, the simplicity of loading any file into it — there is a lot that data scientists love about Snowflake.

We also love GitHub Copilot, and we wish we could use it everywhere.

To combine our interests, we created a Chrome Extension that enables these Copilot-style autocompletions right in the Snowflake UI.

Dealing with a complex JOIN or trying to fill in a tedious list of column names? You can type out your SQL and with our extension Dubo will complete the query for you.

Fill-in-the-middle

You can even leave a comment describing the query and let Dubo write it.

Query from a comment

How it works

The Dubo Chrome Extension has a few components:

  • The basic nuts and bolts of a Chrome Extension, which handles the communication between the backend and the editor.
  • Our backend, which runs a custom large language model (LLM) to generate the completions and scans your database to provide autocomplete suggestions.
  • Our custom CodeMirror editor, which provides the UI for the completions.

Try it out

We think highly of our custom LLMs, but they're expensive to operate, so we're limiting the number of users we onboard initially.

If you'd like to give it a try, fill out our Typeform and we'll reach out to add you to the product.

Follow along

We're on Twitter at @dubo_ai and on LinkedIn @Mercator. If you have any questions or feedback, reach out to us at founders@mercator.tech.

Geospatial Queries in Census Explorer

April 24, 2023

For city planners studying transportation projects or business owners choosing new locations, analyzing data over travel routes or within a drive-time radius is crucial, but requires knowledge of routing engines and a geospatial analytics library like PostGIS or Shapely. Even for experienced data scientists, it's a labor-intensive process. To make it easier, Census Explorer now supports analytics along routes or within areas defined by a drive-time radius. This feature allows users to ask questions about Census Bureau data along the route between two cities, adjacent to a freeway, or within a specified drive time of a location.

For example, ask for the population between New York and LA.

NY to LA

Search for car ownership in neighborhoods around a sports stadium.

Oakland

We can also calculate Census statistics along roads, like the poverty rate along I-80.

I-80

Introducing Census Explorer

March 08, 2023

The US Census Bureau has a wealth of data, available in a variety of formats. However, it still takes a fair amount of technical acuity to answer questions like "Where are people severely rent burdened?" or "What are the most common places where people work from home?".

This is difficult for a few reasons:

  • Context. You must know where to find the data and how to extract it. Even with knowledge of the US Census Bureau's API, the documentation doesn't always work well in your browser. For example, try searching for median income among all the variables available.
  • Analysis. You may need to filter or aggregate data to answer your question. "Where are the incomes higher than the median income in the US?" requires generating that median value in order to answer the question.
  • Visualization. The shape files for the US Census data can be very large, which can make them difficult to visualize them on a map without knowledge of specialized tooling like ArcGIS or QGIS.
  • Storage. The data is a few gigs in size, which can be burdensome depending on your tooling.

Census Explorer solves these problems by taking advantage of developments in natural language processing that we've seen across a number of GPT-inspired products, as well as our custom software for visualizing large data sets.

Real estate developers trying to understand a market, journalists trying to understand the demographics of a neighborhood, or anyone else trying to understand the US Census data can use Census Explorer to answer these questions and more.

Examples from Census Explorer

Boomtowns

You can see the echo of the 2000s real estate bubble in the ring of suburbs around Las Vegas.

Las Vegas

You can see the shale boom if you ask about housing in the 2010s. In some areas of North Dakota, more than half the homes are from the last decade.

North Dakota

Infrastructure planning

Census Explorer can help policymakers determine where to focus investment in public transportation and internet infrastructure. For example, you can identify areas with low-income people who endure punishingly long commutes, such as in the New York City metro area.

New York City

You can also see that Native American reservations in Arizona have among the lowest internet access in the country.

Arizona

Notes on the data

Right now Census Explorer is a limited subset of the 2016-2021 5 Year American Community Survey data. It is also only available on the ZIP Code Tabulation Area (ZCTA) level. If you are interested in other geographies, surveys, or variables, let us know by dropping us a line at founders@mercator.tech.

Let us know what you think

We invite you to try Census Explorer for yourself and share your favorite results with us on Twitter (@dubo_ai) or LinkedIn (@Mercator). If you have any questions or feedback, reach out to us at founders@mercator.tech. Also, don't forget to sign up for our newsletter to receive updates on new features and releases on our homepage.