Skip to content

fix: replace f-string SQL with psycopg2.sql for safe identifier handling#32

Open
wicky-zipstack wants to merge 1 commit intomainfrom
fix/sql-injection-sample-project
Open

fix: replace f-string SQL with psycopg2.sql for safe identifier handling#32
wicky-zipstack wants to merge 1 commit intomainfrom
fix/sql-injection-sample-project

Conversation

@wicky-zipstack
Copy link
Copy Markdown
Contributor

What

  • Replace all f-string SQL interpolation in sample_project.py with psycopg2.sql.SQL, sql.Identifier, and sql.Literal

Why

  • VAPT finding: SQL injection vulnerability in sample project creation code (OR-1226)
  • f-string interpolation used directly in SQL queries for database names, user names, schema names, and passwords
  • Even though inputs are sanitized with re.sub, proper parameterization is the industry standard

How

  • CREATE/DROP DATABASE: sql.Identifier for database names
  • CREATE/DROP USER: sql.Identifier for user names, sql.Literal for passwords
  • GRANT/ALTER SCHEMA: sql.Identifier for schema and user names
  • WHERE datname = ...: sql.Literal for value parameters
  • Schema DDL: sql.Identifier for schema names in DROP/CREATE
  • All changes in the base class SampleProject — covers all 4 sample projects (Jaffle Shop Starter/Final, DVD Rental Starter/Final)

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

  • Low risk. psycopg2.sql generates identical SQL output but with proper quoting. The only difference is identifiers are now double-quoted (e.g., "my_database" instead of my_database), which is valid PostgreSQL and handles reserved words correctly. Sample project creation should work identically.

Database Migrations

  • None

Env Config

  • None

Relevant Docs

Related Issues or PRs

  • Fixes OR-1226: Fix SQL injection in sample_project.py
  • Parent Epic: OR-1219 (Public Repo Launch Checklist)

Dependencies Versions

  • None (psycopg2 already a dependency)

Notes on Testing

  • Create all sample projects (Jaffle Shop Starter, Jaffle Shop Final, DVD Rental Starter, DVD Rental Final)
  • Verify seed data loads correctly
  • Add transformations and run them
  • Verify end-to-end flow works

Screenshots

N/A

Checklist

Replace all raw f-string SQL interpolation in sample_project.py with
psycopg2.sql.SQL, sql.Identifier, and sql.Literal for proper escaping:

- CREATE/DROP DATABASE, CREATE/DROP USER — sql.Identifier for names
- GRANT, ALTER SCHEMA — sql.Identifier for schema and user names
- WHERE datname = ... — sql.Literal for value parameters
- CREATE USER ... PASSWORD — sql.Literal for password value
- Schema DDL (DROP/CREATE SCHEMA) — sql.Identifier for schema names
@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Apr 2, 2026

Greptile Summary

This PR addresses a VAPT-identified SQL injection vulnerability (OR-1226) in sample_project.py by replacing all f-string SQL interpolation with psycopg2.sql.SQL, sql.Identifier, and sql.Literal across the SampleProject base class, covering all four sample project variants (Jaffle Shop Starter/Final, DVD Rental Starter/Final).

Key changes:

  • All database identifiers (database names, user names, schema names) are now wrapped in sql.Identifier, ensuring proper double-quoting and protection against injection.
  • Value parameters (passwords, datname WHERE comparisons) are wrapped in sql.Literal.
  • The dead-code variable user_check_query (which was defined but never executed) is correctly removed — the actual user-existence check in user_exist() was already correctly parameterized with %s.
  • The execute_sql_queries method's type annotation was loosened from list[str] to list to accommodate sql.Composable objects.
  • The implementation is correct: sql.Identifier is used for identifiers and sql.Literal for values throughout, consistent with psycopg2 documentation.
  • One minor note: the type hint on execute_sql_queries could be tightened to list[sql.Composable] to preserve static analysis coverage after removing the str constraint.

Confidence Score: 5/5

  • This PR is safe to merge — it is a well-scoped, correctly implemented security fix with no introduced regressions.
  • All changes are mechanical substitutions of f-string SQL with properly parameterized psycopg2.sql constructs. Identifiers and literals are consistently distinguished. The only open finding is a P2 style suggestion to tighten the type annotation on execute_sql_queries, which does not affect runtime behavior.
  • No files require special attention — the single changed file applies the fix uniformly and completely.

Important Files Changed

Filename Overview
backend/backend/application/sample_project/sample_project.py SQL injection remediation: replaces all f-string SQL interpolation with psycopg2.sql.SQL/Identifier/Literal; removes unused user_check_query; loosens execute_sql_queries type annotation from list[str] to list.

Sequence Diagram

sequenceDiagram
    participant C as Caller
    participant SP as SampleProject
    participant PG as PostgreSQL (admin DB)
    participant NPG as PostgreSQL (new DB)

    C->>SP: load_sample_project()
    SP->>SP: check_project_limit()
    SP->>SP: create_new_database()
    SP->>PG: user_exist() — SELECT 1 FROM pg_roles WHERE rolname = %s
    PG-->>SP: True / False
    SP->>PG: CREATE DATABASE "db_name" TEMPLATE "master_db" (sql.Identifier)
    SP->>PG: CREATE USER "user_name" WITH ENCRYPTED PASSWORD 'pwd' (sql.Identifier + sql.Literal)
    SP->>PG: GRANT ALL PRIVILEGES ON DATABASE "db_name" TO "user_name"
    SP->>SP: _grant_schema_permissions_on_new_db()
    SP->>NPG: GRANT USAGE ON SCHEMA "raw"|"dev"|"stg"|"prod" TO "user_name"
    SP->>NPG: GRANT SELECT/INSERT/UPDATE/DELETE ON ALL TABLES IN SCHEMA {} TO {}
    SP->>NPG: ALTER DEFAULT PRIVILEGES IN SCHEMA {} GRANT ... TO {}
    SP->>NPG: ALTER SCHEMA "dev"|"stg"|"prod" OWNER TO "user_name"
    SP->>SP: create_project_connection()
    SP->>SP: create_schemas() [only if _clone_db=False]
    SP->>PG: DROP/CREATE SCHEMA "raw"|"dev"|"stg"|"prod" (sql.Identifier)
    SP->>SP: upload_and_run_csv()
    SP->>SP: create_and_load_models()
    SP-->>C: project details
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: backend/backend/application/sample_project/sample_project.py
Line: 157

Comment:
**Type annotation weakened to bare `list`**

The annotation was downgraded from `list[str]` to `list`. Since the method now exclusively receives `sql.Composable` objects, a more accurate annotation would help IDEs and static analysis tools catch callers that still pass raw strings.

```suggestion
    def execute_sql_queries(self, statements: list["sql.Composable"]):
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix: replace f-string SQL with psycopg2...." | Re-trigger Greptile

return self._postgres_connection

def execute_sql_queries(self, statements: list[str]):
def execute_sql_queries(self, statements: list):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Type annotation weakened to bare list

The annotation was downgraded from list[str] to list. Since the method now exclusively receives sql.Composable objects, a more accurate annotation would help IDEs and static analysis tools catch callers that still pass raw strings.

Suggested change
def execute_sql_queries(self, statements: list):
def execute_sql_queries(self, statements: list["sql.Composable"]):
Prompt To Fix With AI
This is a comment left during a code review.
Path: backend/backend/application/sample_project/sample_project.py
Line: 157

Comment:
**Type annotation weakened to bare `list`**

The annotation was downgraded from `list[str]` to `list`. Since the method now exclusively receives `sql.Composable` objects, a more accurate annotation would help IDEs and static analysis tools catch callers that still pass raw strings.

```suggestion
    def execute_sql_queries(self, statements: list["sql.Composable"]):
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Updated the type annotation would be more accurate as list[sql.Composable]. Will address in a follow-up — not a functional issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant