Skip to content

Conversation

@majinghe
Copy link

@majinghe majinghe commented Dec 25, 2025

As discussed in #14638, minio is under maintenance mode, so replacing the minio with RustFS. Testing works fine locally.

Testing steps:

  • Generating spark default conf

     spark.sql.extensions                   org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
     spark.sql.catalog.demo                 org.apache.iceberg.spark.SparkCatalog
     spark.sql.catalog.demo.type            rest
     spark.sql.catalog.demo.uri             http://rest:8181
     spark.sql.catalog.demo.io-impl         org.apache.iceberg.aws.s3.S3FileIO
     spark.sql.catalog.demo.warehouse       s3://warehouse/wh
     spark.sql.catalog.demo.s3.endpoint     http://rustfs:9000
     spark.sql.defaultCatalog               demo
     spark.eventLog.enabled                 true
     spark.eventLog.dir                     /home/iceberg/spark-events
     spark.history.fs.logDirectory          /home/iceberg/spark-events
     spark.sql.catalogImplementation        in-memory
     spark.sql.catalog.demo.s3.path-style-access true
    
  • Running container

    Running docker command to run all containers

     docker compose up -d
    
  • Insert data

     docker exec -it spark-iceberg spark-sql
     Setting default log level to "WARN".
     To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
     25/12/25 01:41:54 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
     25/12/25 01:42:05 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
     Spark Web UI available at http://4dbf6384ac7a:4041
     Spark master: local[*], Application Id: local-1766626926764
     spark-sql ()> 
                 > CREATE NAMESPACE demo.nyc;
     Time taken: 4.89 seconds
     spark-sql ()> CREATE TABLE demo.nyc.taxis
                 > (
                 >   vendor_id bigint,
                 >   trip_id bigint,
                 >   trip_distance float,
                 >   fare_amount double,
                 >   store_and_fwd_flag string
                 > )
                 > PARTITIONED BY (vendor_id);
     Time taken: 6.362 seconds
     spark-sql ()> INSERT INTO demo.nyc.taxis
                 > VALUES (1, 1000371, 1.8, 15.32, 'N'), (2, 1000372, 2.5, 22.15, 'N'), (2, 1000373, 0.9, 9.01, 'N'), (1, 1000374, 8.4, 42.13, 'Y');
     Time taken: 17.706 seconds
     spark-sql ()> 
    
  • Data verification

    Checking inserted data on RustFS instance

    截屏2025-12-25 09 44 13

@github-actions github-actions bot added the docs label Dec 25, 2025
@majinghe majinghe changed the title replace minio with rustfs in quick start docs: replace minio with rustfs in quick start Dec 25, 2025
@myrust-go
Copy link

Hello @manuzhang @majinghe ,

After testing again, everything works perfectly. Thank you so much!

We look forward to long-term, stable support for S3 storage.

Screenshot

image

Command history

[+] Running 5/5
 ✔ Network root_iceberg_net  Created                                                                                                                                                                                                                                       0.0s
 ✔ Container iceberg-rest    Started                                                                                                                                                                                                                                       4.4s
 ✔ Container rustfs          Started                                                                                                                                                                                                                                       4.4s
 ✔ Container mc              Started                                                                                                                                                                                                                                       0.6s
 ✔ Container spark-iceberg   Started                                                                                                                                                                                                                                       0.7s
[root@iZj6c6r9c0a5dquqg47y2vZ ~]# docker exec -it spark-iceberg spark-sql

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
25/12/29 01:12:35 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
25/12/29 01:12:36 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Spark Web UI available at http://597b077fc8c5:4041
Spark master: local[*], Application Id: local-1766970756749
spark-sql ()>
            > CREATE NAMESPACE demo.nyc;
Time taken: 0.889 seconds
spark-sql ()> CREATE TABLE demo.nyc.taxis
            > (
            >   vendor_id bigint,
            >   trip_id bigint,
            >   trip_distance float,
            >   fare_amount double,
            >   store_and_fwd_flag string
            > )
            > PARTITIONED BY (vendor_id);
Time taken: 1.355 seconds
spark-sql ()>
            >
            > INSERT INTO demo.nyc.taxis
            > VALUES (1, 1000371, 1.8, 15.32, 'N'), (2, 1000372, 2.5, 22.15, 'N'), (2, 1000373, 0.9, 9.01, 'N'), (1, 1000374, 8.4, 42.13, 'Y');
Time taken: 2.652 seconds
spark-sql ()> SELECT * FROM demo.nyc.taxis;
1	1000371	1.8	15.32	N
1	1000374	8.4	42.13	Y
2	1000372	2.5	22.15	N
2	1000373	0.9	9.01	N
Time taken: 0.631 seconds, Fetched 4 row(s)

spark-defaults.conf


cat spark-defaults.conf
spark.sql.extensions                   org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
spark.sql.catalog.demo                 org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.demo.type            rest
spark.sql.catalog.demo.uri             http://rest:8181
spark.sql.catalog.demo.io-impl         org.apache.iceberg.aws.s3.S3FileIO
spark.sql.catalog.demo.warehouse       s3://warehouse/wh
spark.sql.catalog.demo.s3.endpoint     http://rustfs:9000
spark.sql.defaultCatalog               demo
spark.eventLog.enabled                 true
spark.eventLog.dir                     /home/iceberg/spark-events
spark.history.fs.logDirectory          /home/iceberg/spark-events
spark.sql.catalogImplementation        in-memory
spark.sql.catalog.demo.s3.path-style-access true

docker-compose.yaml

cat docker-compose.yaml
services:
  spark-iceberg:
    image: tabulario/spark-iceberg
    container_name: spark-iceberg
    build: spark/
    networks:
      iceberg_net:
    depends_on:
      - rest
      - rustfs
    volumes:
      - ./warehouse:/home/iceberg/warehouse
      - ./notebooks:/home/iceberg/notebooks/notebooks
      - ./spark-defaults.conf:/opt/spark/conf/spark-defaults.conf
    environment:
      - AWS_ACCESS_KEY_ID=rustfsadmin
      - AWS_SECRET_ACCESS_KEY=rustfsadmin
      - AWS_REGION=us-east-1
    ports:
      - 8888:8888
      - 8080:8080
      - 10000:10000
      - 10001:10001
  rest:
    image: apache/iceberg-rest-fixture
    container_name: iceberg-rest
    networks:
      iceberg_net:
    ports:
      - 8181:8181
    environment:
      - AWS_ACCESS_KEY_ID=rustfsadmin
      - AWS_SECRET_ACCESS_KEY=rustfsadmin
      - AWS_REGION=us-east-1
      - CATALOG_WAREHOUSE=s3://warehouse/
      - CATALOG_IO__IMPL=org.apache.iceberg.aws.s3.S3FileIO
      - CATALOG_S3_ENDPOINT=http://rustfs:9000
      - CATALOG_S3_PATH__STYLE__ACCESS=true
  rustfs:
    image: rustfs/rustfs:latest
    container_name: rustfs
    environment:
      - RUSTFS_ACCESS_KEY=rustfsadmin
      - RUSTFS_SECRET_KEY=rustfsadmin
      - RUSTFS_VOLUMES=/data
      - RUSTFS_ADDRESS=0.0.0.0:9000
      - RUSTFS_CONSOLE_ADDRESS=0.0.0.0:9001
      - RUSTFS_CONSOLE_ENABLE=true
    networks:
      iceberg_net:
    ports:
      - 9001:9001
      - 9000:9000
  mc:
    depends_on:
      - rustfs
    image: minio/mc
    container_name: mc
    networks:
      iceberg_net:
    environment:
      - AWS_ACCESS_KEY_ID=rustfsadmin
      - AWS_SECRET_ACCESS_KEY=rustfsadmin
      - AWS_REGION=us-east-1
    entrypoint: |
      /bin/sh -c "
      until (/usr/bin/mc alias set rustfs http://rustfs:9000 rustfsadmin rustfsadmin) do echo '...waiting...' && sleep 1; done;
      /usr/bin/mc rm -r --force rustfs/warehouse;
      /usr/bin/mc mb rustfs/warehouse;
      /usr/bin/mc policy set public rustfs/warehouse;
      tail -f /dev/null
      "
networks:
  iceberg_net:

Comment on lines 55 to 56
- AWS_ACCESS_KEY_ID=rustfsadmin
- AWS_SECRET_ACCESS_KEY=rustfsadmin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- AWS_ACCESS_KEY_ID=rustfsadmin
- AWS_SECRET_ACCESS_KEY=rustfsadmin
- AWS_ACCESS_KEY_ID=admin
- AWS_SECRET_ACCESS_KEY=password

nit: can we keep this the same as before? i think its a good idea to do the smallest amount of change to the infra code. And if a change must be made, have it isolated to only the storage infra (minio/rustfs)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this is the default username and password for rustfs given by user, admin/password also works fine. Already changed those value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants