Data Formulator is an open-source AI-powered data visualization tool by Microsoft Research that helps users quickly create rich visualizations through simple interactions and natural language commands.
Data Formulator - Microsoft Research's Open Source AI Data Visualization Tool
What is Data Formulator?
Data Formulator is an open-source AI-driven data visualization tool developed by Microsoft Research. It helps users quickly create rich visualizations through simple interactions and natural language commands. The tool combines a graphical user interface (GUI) with natural language input (NL), allowing users to design charts via drag-and-drop operations or direct input. The AI handles complex data transformations, making it easier for users to generate insightful visualizations.
Key Features of Data Formulator
- Combination of Graphical Interface and Natural Language Input: Users can drag and drop data fields into chart properties or describe their needs in natural language. The AI will complete the data transformation and visualization based on the instructions.
- Support for Complex Data Transformations: Users can input non-existent data field names in the coding bar, and the AI will perform data calculations and transformations based on natural language prompts, generating new visualization content.
- Iterative Visualization Design: Data Formulator provides a "Data Thread" feature, allowing users to perform further operations on existing charts. The AI will update the charts based on natural language instructions.
- Result Verification and Error Correction: Users can view the AI-generated transformation data, visualizations, and code, and understand the data transformation process through the code explanation module. If errors are found, they can be corrected using the iterative mechanism of the data thread.
- Flexible Chart Style Adjustments: Users can adjust chart styles (such as color schemes, axis sorting, etc.) directly on the conceptual coding shelf without additional data transformations, and see visual feedback immediately.
Technical Principles of Data Formulator
- Multimodal Interaction Interface: Data Formulator combines a graphical user interface (GUI) with natural language input (NL), allowing users to define visualization needs through drag-and-drop operations or direct natural language commands. This dual approach enables users to efficiently convey their needs based on their preferences.
- Concept Binding and Data Transformation: Users first define the data concepts they plan to visualize through natural language or examples, then bind these concepts to visualization channels (such as x-axis, y-axis, color, etc.). Data Formulator automatically transforms the input data into the required format using its AI agent, generating the desired visualization.
- AI Agent and Code Generation: Data Formulator's backend uses the Flask framework, receiving frontend requests via RESTful API. When the user clicks the "Formulate" button, the frontend sends a POST request to the backend's
/derive-data
interface. The backend calls the AI agent (e.g., DataTransformationAgentV2
) to generate Python code based on the user's instructions and data, executing the code to complete the data transformation.
- Data Processing and Feedback Mechanism: Data Formulator provides a data thread feature, allowing users to perform further operations on existing charts. The AI updates the charts based on natural language instructions. Data Formulator also offers a feedback mechanism, enabling users to view the AI-generated transformation data, visualizations, and code to ensure the results meet expectations.
- Open Source and Flexibility: Data Formulator is an open-source project. Users can install and run it locally via Python PIP or use it directly in GitHub Codespaces.
Project Address of Data Formulator
Application Scenarios of Data Formulator
- Data Analysis and Visualization: Data Formulator helps users quickly transform complex data into intuitive visualizations, enabling them to quickly identify trends and patterns in the data.
- Data Concept Expansion and Calculation: Users can define non-existent data concepts through natural language input. For example, when analyzing sustainable energy data, users can add "Percentage of Sustainable Energy" to the y-axis. Even if the original data does not directly provide percentage values, Data Formulator will automatically calculate and generate the corresponding visualization.
- Iteration and Optimization: Data Formulator supports iterative design based on existing charts. Users can modify and optimize existing charts through natural language instructions without needing to describe the entire design from scratch.
- Multimodal Interaction: Users can define visualization needs through a graphical interface (drag-and-drop operations) or natural language input, making Data Formulator suitable for users with different skill levels.