Skip to content

Conversation

@kevinjacobs-delfi
Copy link

Summary

This PR changes __eq__, relative_to, and is_relative_to to compare paths based on filesystem identity (fsid) rather than storage_options directly.

Fixes #532

The Problem

Previously, paths to the same resource were considered unequal if their storage_options differed:

# Pre-this-PR: False (unexpected!)
UPath('s3://bucket/file.txt') == UPath('s3://bucket/file.txt', anon=True)

# Pre-this-PR: ValueError (unexpected!)
p1 = UPath('s3://bucket/dir/file.txt', anon=True)
p2 = UPath('s3://bucket/dir')
p1.relative_to(p2)

The Solution

Use fsid (filesystem identifier) to determine if two paths are on the same filesystem. The fsid ignores options that don't affect which filesystem is accessed (auth, performance settings) while considering options that do (endpoint_url, account_name, host+port).

# Now works as expected
UPath('s3://bucket/file.txt') == UPath('s3://bucket/file.txt', anon=True)  # True
UPath('/tmp/file.txt') == UPath('/tmp/file.txt', auto_mkdir=True)          # True

# Different endpoints are correctly identified as different filesystems
UPath('s3://bucket/file.txt') != UPath('s3://bucket/file.txt', 
    endpoint_url='http://localhost:9000')  # True

Key Implementation Details

  • No filesystem instantiation: fsid is computed from protocol, storage_options, and fsspec global config (fsspec.config.conf) without instantiating the filesystem
  • Fallback behavior: For filesystems where fsid cannot be determined (memory, unknown protocols), falls back to storage_options comparison
  • Returns None instead of raising: Unlike fsspec's fs.fsid which raises NotImplementedError, UPath.fsid returns None
  • Verified against fsspec: Audit tests ensure our fallback matches native fsid for LocalFileSystem and HTTPFileSystem
  • No caching: LRU caching was not added to _fallback_fsid due to complexity (would need to handle mutable storage_options dicts) and the minimal cost of computing fsid on the fly. If needed, UPath.fsid can be changed to a cached property in the future.

Changes

  • Add upath/_fsid.py with _fallback_fsid() for computing fsid from protocol + storage_options + global config
  • Add fsid property to _UPathMixin and ProxyUPath
  • Update __eq__ in UPath and LocalPath to use fsid
  • Update relative_to and is_relative_to to use fsid
  • Add tests in upath/tests/test_fsid.py including audit tests
  • Document behavior in migration guide (docs/migration.md) and concepts (docs/concepts/upath.md)

Test plan

  • All existing tests pass (393 passed)
  • New tests for fsid-based equality (16 tests)
  • Audit tests verify fallback matches native fsid implementations
  • Tests cover local, HTTP, S3, and memory filesystems
  • Tests cover relative_to and is_relative_to with matching/different fsids
  • Tests verify global config integration

🤖 Generated with Claude Code

Change __eq__, relative_to, and is_relative_to to compare paths based on
filesystem identity (fsid) rather than storage_options directly.

This fixes the issue where paths to the same resource were considered
unequal due to differing non-identity options (auth, performance, etc.):

    # Previously unequal, now equal (same S3 filesystem)
    UPath('s3://bucket/file.txt') == UPath('s3://bucket/file.txt', anon=True)

The fsid is computed from protocol, storage_options, and fsspec global
config without instantiating the filesystem. For filesystems where fsid
cannot be determined, falls back to storage_options comparison.

Key changes:
- Add upath/_fsid.py with _fallback_fsid() for computing fsid
- Add fsid property to _UPathMixin and ProxyUPath
- Update __eq__ in UPath and LocalPath to use fsid
- Update relative_to and is_relative_to to use fsid
- Add tests for fsid-based equality
- Document behavior in migration guide and concepts docs

Closes fsspec#532

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RFC: Distinguish path equivalence from equality

1 participant