This is a Sidecar for the highly scalable Apache Cassandra database. For more information, see the Apache Cassandra web site and CIP-1.
This is project is still WIP.
- Java >= 111 (OpenJDK or Oracle)
- Apache Cassandra 4.0. We depend on virtual tables which is a 4.0 only feature.
- Docker for running integration tests.
We depend on the Cassandra in-jvm dtest framework for testing. Because these jars are not published, you must manually build the dtest jars before you can build the project.
./scripts/build-dtest-jars.shThe build script supports two parameters:
REPO- the Cassandra git repository to use for the source files. This is helpful if you need to test with a fork of the Cassandra codebase.- default:
[email protected]:apache/cassandra.git
- default:
BRANCHES- a space-delimited list of branches to build. -default:"cassandra-4.1 trunk"
Remove any versions you may not want to test with. We recommend at least the latest (released) 4.X series and trunk.
See Testing for more details on how to choose which Cassandra versions to use while testing.
For multi-node in-jvm dtests, network aliases will need to be setup for each Cassandra node. The tests assume each node's ip address is 127.0.0.x, where x is the node id.
For example if you populated your cluster with 3 nodes, create interfaces for 127.0.0.2 and 127.0.0.3 (the first node of course uses 127.0.0.1).
To get up and running, create a temporary alias for every node except the first:
for i in {2..20}; do sudo ifconfig lo0 alias "127.0.0.${i}"; done
Note that this does not persist across reboots, so you'll have to run it every time you restart.
After you clone the git repo, you can use the gradle wrapper to build and run the project. Make sure you have
Apache Cassandra running on the host & port specified in conf/sidecar.yaml.
$ ./gradlew run
Alternatively, you can run against a local CCM cluster. Cassandra Sidecar provides a configuration for a 3-node
CCM cluster named sidecardemo. You can use the gradle wrapper to run the project connected to a 3-node CCM cluster
as follows:
$ ./gradlew run -Dsidecar.config=file:///$PWD/examples/conf/sidecar-ccm.yaml
Please see samples for details.
While setting up cassandra instance, make sure the data directories of cassandra are in the path stored in sidecar.yaml file, else modify data directories path to point to the correct directories for stream APIs to work.
Apache Cassandra Sidecar supports Change Data Capture (CDC) to stream table mutations to Apache Kafka. This section describes how to configure and run Sidecar with CDC enabled.
- Apache Cassandra 4.0+ with CDC support
- Apache Kafka cluster
- Sidecar configured with schema management enabled
Edit your cassandra.yaml configuration file and enable CDC:
cdc_enabled: trueRestart your Cassandra instance for this change to take effect.
Edit your sidecar.yaml configuration file with the following settings:
sidecar:
# Enable schema management (required for CDC)
schema:
is_enabled: true
keyspace: sidecar_internal
replication_strategy: SimpleStrategy
replication_factor: 3
# Enable CDC feature
cdc:
enabled: true
config_refresh_time: 10s
table_schema_refresh_time: 60s
segment_hardlink_cache_expiry: 1mConfiguration Parameters:
schema.is_enabled: Must betruefor CDC to function. Creates thesidecar_internalkeyspace for CDC state management.cdc.enabled: Enables the CDC feature in Sidecar.cdc.config_refresh_time: How frequently CDC configuration is refreshed from the database.cdc.table_schema_refresh_time: How frequently table schemas are refreshed for CDC-enabled tables.cdc.segment_hardlink_cache_expiry: Cache expiration time for CDC segment hard links.
For each table you want to capture changes from, enable the CDC property using CQL:
-- For a new table
CREATE TABLE my_keyspace.my_table (
id text PRIMARY KEY,
name text,
value int
) WITH cdc = true;
-- For an existing table
ALTER TABLE my_keyspace.my_table WITH cdc = true;Use the CDC configuration API endpoint to set up CDC parameters:
curl --request PUT \
--url http://localhost:9043/api/v1/services/cdc/config \
--header 'content-type: application/json' \
--data '{
"config": {
"datacenter": "datacenter1",
"env": "production",
"topic_format_type": "STATIC",
"topic": "cdc-events"
}
}'CDC Configuration Parameters:
datacenter: The datacenter name for this Sidecar instance.env: Environment identifier (e.g.,production,staging,dev).topic_format_type: Determines how Kafka topic names are generated. Options:STATIC: Use a single fixed topic name specified intopicfieldKEYSPACE: Format as{topic}-{keyspace}KEYSPACETABLE: Format as{topic}-{keyspace}-{table}TABLE: Format as{topic}-{table}MAP: Use custom topic mapping (advanced)
topic: Base Kafka topic name for CDC events.
Configure the Kafka producer settings using the Kafka configuration API endpoint:
curl --request PUT \
--url http://localhost:9043/api/v1/services/kafka/config \
--header 'content-type: application/json' \
--data '{
"config": {
"bootstrap.servers": "localhost:9092",
"key.serializer": "org.apache.kafka.common.serialization.StringSerializer",
"value.serializer": "org.apache.kafka.common.serialization.ByteArraySerializer",
"acks": "all",
"retries": "3",
"retry.backoff.ms": "200",
"enable.idempotence": "true",
"batch.size": "16384",
"linger.ms": "5",
"buffer.memory": "33554432",
"compression.type": "snappy",
"request.timeout.ms": "30000",
"delivery.timeout.ms": "120000",
"max.in.flight.requests.per.connection": "5",
"client.id": "cdc-producer"
}
}'Key Kafka Producer Parameters:
bootstrap.servers: Comma-separated list of Kafka broker addresses.key.serializer: Serializer for the message key (useStringSerializer).value.serializer: Serializer for the message value (useByteArraySerializerfor Avro).acks: Number of acknowledgments the producer requires (allfor maximum durability).enable.idempotence: Ensures exactly-once semantics when set totrue.compression.type: Compression algorithm (snappy,gzip,lz4,zstd, ornone).
For a complete list of Kafka producer configurations, see the Apache Kafka Producer Configuration Documentation.
CDC events are serialized in Apache Avro format. Sidecar includes a built-in schema store (CachingSchemaStore) that:
- Automatically tracks CDC-enabled table schemas
- Converts CQL schemas to Avro schemas
- Refreshes schemas based on
table_schema_refresh_timeconfiguration - Caches Avro schemas for performance
Each CDC event published to Kafka contains:
- Key: Table identifier (keyspace + table name)
- Value: Avro-serialized mutation data containing:
- Partition key
- Clustering key (if applicable)
- Mutation type (INSERT, UPDATE, DELETE)
- Column values
- Timestamp
After completing the configuration:
-
Check Sidecar Logs: Verify CDC is enabled and connected to Kafka:
grep -i "cdc" /path/to/sidecar.log -
Verify Configuration: Retrieve current CDC and Kafka configurations:
# Get CDC configuration curl http://localhost:9043/api/v1/services/cdc/config # Get Kafka configuration curl http://localhost:9043/api/v1/services/kafka/config # Get all service configurations curl http://localhost:9043/api/v1/services
While Sidecar includes a built-in schema store, you can integrate with external schema registries by:
- Implementing a custom
SchemaStoreinterface - Registering your implementation via Guice dependency injection
- Configuring your schema registry connection details in the Kafka producer configuration
CDC not starting:
- Verify
schema.is_enabled: trueinsidecar.yaml - Check Cassandra has
cdc_enabled: true - Ensure
sidecar_internalkeyspace exists and is accessible
No messages in Kafka:
- Verify tables have
cdc = trueproperty - Check Kafka connectivity and broker availability
- Review Sidecar logs for errors:
grep -i "kafka\|cdc" /path/to/sidecar.log - Verify CDC and Kafka configurations are set via API endpoints
Schema errors:
- Ensure table schemas are stable (avoid frequent schema changes during CDC)
- Check
table_schema_refresh_timeis appropriate for your use case - Review Sidecar logs for schema conversion errors
The test framework is set up to run 4.1 and 5.1 (Trunk) tests (see TestVersionSupplier.java) by default.
You can change this via the Java property cassandra.sidecar.versions_to_test by supplying a comma-delimited string.
For example, -Dcassandra.sidecar.versions_to_test=4.0,4.1,5.1.
You will need to use the "Add Projects" function of CircleCI to set up CircleCI on your fork. When promoted to create a branch, do not replace the CircleCI config, choose the option to do it manually. CircleCI will pick up the in project configuration.
We warmly welcome and appreciate contributions from the community. Please see CONTRIBUTING.md if you wish to submit pull requests.
- Join us in #cassandra on ASF Slack and ask questions
- Subscribe to the Users mailing list by sending a mail to [email protected]
- Visit the community section of the Cassandra website for more information on getting involved.
- Visit the development section of the Cassandra website for more information on how to contribute.
- File issues with our Sidecar JIRA
1 The Sidecar Client offers Java 1.8 compatibility, and produces artifacts for both Java 1.8 and Java 11.