Add chdb in-process backend via interface="chdb" by wudidapaopao · Pull Request #753 · ClickHouse/clickhouse-connect

wudidapaopao · 2026-05-21T09:47:29Z

Summary

Adds an in-process backend that uses the embedded chdb engine instead of HTTP. Selected via clickhouse_connect.get_client(interface="chdb"). No ClickHouse server required.

The same NativeTransform byte parser the HTTP client uses is reused verbatim, so all existing type / dtype / streaming / DB-API / SQLAlchemy code paths work unchanged.

Usage examples

In-memory (default):

client = clickhouse_connect.get_client(interface="chdb")

Persistent file path:

client = clickhouse_connect.get_client(interface="chdb", chdb_path="/var/data/mydb")

Engine startup options as a dict:

client = clickhouse_connect.get_client(
    interface="chdb",
    chdb_path="/var/data/mydb",
    chdb_options={"mode": "ro", "max_threads": 4},
)

Or inline in the path itself:

client = clickhouse_connect.get_client(
    interface="chdb",
    chdb_path="/var/data/mydb?mode=ro&max_threads=4",
)

ClickHouse server settings applied for the lifetime of the client (issued via SET k=v at construction):

client = clickhouse_connect.get_client(
    interface="chdb",
    chdb_path=":memory:",
    database="analytics",
    settings={"max_block_size": 65536, "date_time_output_format": "iso"},
)

Async usage is symmetric:

async with await clickhouse_connect.get_async_client(interface="chdb", chdb_path="/var/data/mydb") as c:
    r = await c.query("SELECT count() FROM events")

Checklist

Delete items not relevant to your PR:

Unit tests covering the common scenarios were added
A human-readable description of the changes was provided to include in CHANGELOG

…st parity

joe-clickhouse · 2026-05-21T18:13:54Z

@wudidapaopao thanks! This is something I've been wanting to do for a while.

Before we merge, I want to do some more research on both sides, chdb-core and clickhouse-connect, to figure out the right architectural fit here. In its current form this adds another full client surface for us to maintain, including both sync and async paths, which is not ideal long term. Every new backend risks duplicating public methods, settings handling, streaming behavior, inserts, error handling, and tests.

In an ideal world, chdb-core could expose a loopback-only ephemeral HTTP endpoint and our existing HttpClient could consume it like any other ClickHouse server. I had an agent take a quick look, and it tells me the ClickHouse HTTP server/handler code exists in the chdb-core tree and looks structurally reusable, but it is not currently wired into EmbeddedServer, and chDB's embedded path is intentionally no-networking today. If upstream considers that viable, it would keep the clickhouse-connect side much simpler.

If that is not viable, I think we should consider this as a backend refactor on our side rather than adding a separate client subclass. Currently, I'm thinking one public client API with pluggable execution backends, so chDB support is implemented behind the existing client instead of as a parallel client family. (Separate concern, but this would also allow room for future TCP native support as well.)

Either way, I'd like to spend some time on this before merging. Thanks again for putting this together and for the very thorough test coverage. I'll post back here as I get through the research.

auxten · 2026-05-22T04:01:03Z

Thanks @joe-clickhouse for the thoughtful response, and for taking the time to think through the architectural fit on both sides — really appreciate it.

I strongly agree with the overall direction. If I may add a couple of thoughts from the chDB side:

One thing we've consistently tried hard to preserve in chDB is minimizing serialization/deserialization overhead — it's arguably one of the main reasons users reach for an embedded engine in the first place. A loopback-only HTTP endpoint inside chdb-core would definitely make the clickhouse-connect side much simpler, and that's genuinely appealing. But it would also reintroduce serialize/deserialize round-trips that aren't strictly necessary in the local/embedded mode, which can noticeably hurt performance and increase memory usage — exactly the kind of cost chDB users tend to come to chDB to avoid.

Relatedly, chDB already supports zero-copy read/write for pandas DataFrames. Keeping the in-process path (i.e. not going through a server boundary) preserves that property end-to-end, and I think it also opens up nicer downstream integrations — both deeper interop with clickhouse-connect itself, and a much smoother experience for users who live in the pandas ecosystem.

So just my two cents (please take it as just a suggestion): if we can let users switch the execution engine by changing a single place — without touching any of their existing code — I think that would offer the best developer experience. Your pluggable-backend idea actually sounds very aligned with this: existing clickhouse-connect code stays untouched while the zero-copy in-process path is available under the hood.

Happy to dig deeper on either direction with you — whatever helps the research move forward. Thanks again.

wudidapaopao added 4 commits May 21, 2026 02:18

Add chdb in-process backend via interface="chdb"

2caf0ba

Drop sys_platform marker on chdb extra so Windows fails at install time

0466ca5

Drop chdb DataFrame fast path and clean up dead code

1230dbf

Restore chdb session settings after command() per-call overrides

e70649f

wudidapaopao requested review from joe-clickhouse and peter-leonov-ch as code owners May 21, 2026 09:47

Add CHANGELOG entry for chdb backend

b20f76c

wudidapaopao removed request for joe-clickhouse and peter-leonov-ch May 21, 2026 09:51

wudidapaopao added 3 commits May 21, 2026 11:00

Wire chdb parameter binding and expand test coverage

86ad2c9

Add chdb tests for arrow/numpy/streaming/DBAPI parity with HTTP

889a307

Fix chdb backend bugs and behavior gaps found via HTTP integration te…

bdb483c

…st parity

wudidapaopao requested review from ShawnChen-Sirius and auxten May 21, 2026 15:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add chdb in-process backend via interface="chdb"#753

Add chdb in-process backend via interface="chdb"#753
wudidapaopao wants to merge 8 commits into
ClickHouse:mainfrom
wudidapaopao:feat/chdb-backend

wudidapaopao commented May 21, 2026 •

edited

Loading

Uh oh!

joe-clickhouse commented May 21, 2026

Uh oh!

auxten commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wudidapaopao commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Usage examples

Checklist

Uh oh!

joe-clickhouse commented May 21, 2026

Uh oh!

auxten commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wudidapaopao commented May 21, 2026 •

edited

Loading