Convert SQL database schema tables and query scripts into organized YAML configuration properties.
The SQL to YAML conversion process utilizes a recursive descent parser to decompose Structured Query Language (SQL) statements into an Abstract Syntax Tree (AST). Once the AST is generated, the tool maps relational components—such as SELECT clauses, JOIN predicates, and WHERE filters—into a hierarchical key-value structure characteristic of YAML. This transformation allows developers to treat database logic as configuration-as-code, enabling version control and schema validation without executing raw scripts against a production database.
This tool provides a granular mapping system that translates SQL dialects (PostgreSQL, MySQL, Snowflake) into a standardized YAML schema. Key features include automatic alias detection, where table aliases are converted into nested YAML objects, and dependency resolution, which identifies foreign key relationships to order the YAML output logically. By decoupling the query logic from the execution engine, teams can implement dry-run validations and linting across their data pipeline before deployment.
To integrate this conversion into a modern workflow, developers can use a CLI wrapper or an API endpoint. For instance, when using Python to automate the conversion of a directory of .sql files into a single configuration manifest, the following implementation pattern is recommended:
import yaml
import sql_parser_lib
def convert_sql_to_yaml(sql_file):
with open(sql_file, 'r') as f:
sql_content = f.read()
# Parse SQL to AST then to Dictionary
structured_data = sql_parser_lib.parse_to_dict(sql_content)
# Export as YAML string
return yaml.dump(structured_data, default_flow_style=False)
print(convert_sql_to_yaml('analytics_query.sql'))For bash-based environments, the tool can be piped through a stream to automate the generation of dbt-style schema.yml files:
cat query.sql | sql2yaml --format dbt-core > models/schema.ymlThe conversion process is performed entirely in-memory, ensuring that no PII (Personally Identifiable Information) or database credentials stored in SQL comments are persisted to disk unless explicitly configured. To maintain strict security parameters, the tool implements the following:
CONNECTION strings and PASSWORD literals during the parsing phase.DROP TABLE) that trigger a security warning before YAML generation.The target audience for this tool includes Analytics Engineers building dbt projects, DevOps Engineers automating database migrations, and Data Architects who require a language-agnostic representation of their data lineage.
The parser treats subqueries as nested objects within the YAML hierarchy, creating a parent-child relationship that mirrors the SQL nesting level. Joins are converted into a 'relationships' array where each element specifies the join type (INNER, LEFT, OUTER), the target table, and the join condition. This ensures that the structural integrity of the relational logic is preserved even when the syntax is flattened into a configuration file.
While the primary function is SQL to YAML, the tool supports a bidirectional mapping mode for specific standardized schemas. By utilizing a template engine, the YAML configuration can be injected into a SQL generator that reconstructs the query based on the defined keys. However, custom SQL extensions or vendor-specific hints may be lost during this round-trip process unless explicitly defined in the YAML metadata.
The tool employs a streaming parser that processes SQL tokens sequentially rather than loading the entire script into a single memory block. For scripts exceeding 10,000 lines, the parser utilizes a chunking mechanism that breaks the query into Common Table Expressions (CTEs) and processes them as independent YAML nodes. This prevents memory overflow and ensures linear time complexity relative to the number of tokens in the SQL script.
Yes, window functions are captured as specialized 'analytic' blocks within the YAML output. The tool identifies the OVER clause and separates the PARTITION BY and ORDER BY logic into distinct YAML attributes. This allows developers to analyze the windowing logic of a query without needing to execute it, which is critical for optimizing heavy analytical workloads.
Since the tool performs static analysis and does not execute the SQL, there is no risk of traditional SQL injection during the conversion. However, to prevent 'YAML injection' or malicious configuration overrides, the tool sanitizes all input tokens and escapes special characters that could interfere with YAML parsing. All output is validated against a strict schema to ensure that only legitimate configuration keys are generated.
The parser can be configured to either strip comments entirely or map them to a 'description' key within the corresponding YAML block. For example, a comment preceding a column definition in SQL will be converted into a metadata attribute for that specific column in the YAML file. This allows teams to maintain their business logic documentation directly within the code and carry it over into their configuration manifests.