Implement perceptual hashing with BK-tree indexing for O(log n) similarity search#319
Draft
Implement perceptual hashing with BK-tree indexing for O(log n) similarity search#319
Conversation
…uration Co-authored-by: adulau <[email protected]>
Co-authored-by: adulau <[email protected]>
Co-authored-by: adulau <[email protected]>
Copilot
AI
changed the title
[WIP] Implement phash functionality with efficient BK-tree indexing
Implement perceptual hashing with BK-tree indexing for O(log n) similarity search
Jan 29, 2026
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Need perceptual hash (phash) support for image similarity detection. Naive implementation requires O(n) enumeration of all phashes for similarity queries, which doesn't scale beyond 10k images.
Solution
Implement BK-tree (Burkhard-Keller tree) indexing structure in KVRocks using Hamming distance as the metric. Tree pruning via triangle inequality reduces similarity search from O(n) to O(log n).
Tree structure:
Search pruning:
Implementation
Core Objects
Phashs.py: Phash object class, BK-tree operations (add, search, rebuild), hamming distance calculationimagehashlibrary for DCT-based 64-bit phash computationProcessing Pipeline
Integration Points
correlations_engine.py: Addedphash: [image, phash]correlation typesail_objects.py: Registered Phash in OBJECTS_CLASSail_core.py: Added phash to AIL_OBJECTS setsmodules.cfg: ImagePhash subscribes to Image queue, publishes to PhashCorrelation queueConfiguration
UI
/objects/phasheswith daterange filteringMaintenance
rebuild_phash_index.py: Rebuilds BK-tree from all existing phash objects (for migrations or corruption recovery)Dependencies
imagehash>=4.3.0added to requirements.txtTesting
27 tests covering phash object operations, hamming distance edge cases, BK-tree insertion/search with various thresholds, and index rebuilding.
Original prompt
Phash Implementation with Efficient BK-Tree Indexing
Overview
This PR implements perceptual hashing (phash) functionality for AIL with an efficient BK-tree indexing structure to enable fast similarity detection without full enumeration. This builds upon the work in PR #318 by @cavedave and addresses the performance concerns about enumerating all phash objects.
Key Improvements Over PR #318
1. BK-Tree Indexing for Efficient Search
2. All Original Features from PR #318
imagehashlibrary (64-bit DCT-based phash)Implementation Details
BK-Tree Index Structure
The BK-tree is stored in KVRocks using the following keys:
How it works:
Performance:
Files to Create/Modify
New Files (from PR #318)
bin/lib/objects/Phashs.py- Enhanced with BK-tree functions:Phashclass: Represents a perceptual hash valuePhashscollection class: Manages Phash objectsadd_to_bktree_index(): Insert phash into BK-treesearch_bktree_index(): Fast similarity search using BK-treehamming_distance(): Calculate Hamming distance between phashesrebuild_bktree_index(): Rebuild index from all existing phashesbin/modules/ImagePhash.py(71 lines)bin/modules/PhashCorrelation.py- Enhanced with BK-tree search:search_bktree_index()for efficient similarity searchbin/tools/rebuild_phash_index.py(NEW - not in original PR)var/www/blueprints/objects_phash.py(74 lines)/objects/phashes- List view/objects/phash/post- Form handling/objects/phash/range/json- Chart datavar/www/templates/objects/phash/PhashDaterange.html(164 lines)tests/test_objects_phashes.py(339 lines) - Enhanced with BK-tree tests:Modified Files (from PR #318)
bin/lib/objects/Images.pyPhashs.py(as suggested in PR review)bin/lib/objects/Screenshots.pyPhashs.py(as suggested in PR review)bin/lib/correlations_engine.py"phash": ["image", "phash"]toCORRELATION_TYPES_BY_OBJbin/lib/objects/ail_objects.pyPhashinOBJECTS_CLASSdictionarybin/lib/ail_core.py'phash'toAIL_OBJECTSset'phash'toAIL_OBJECTS_CORRELATIONS_DEFAULTsetconfigs/modules.cfg[ImagePhash]section with queue configuration[PhashCorrelation]section with queue configurationbin/LAUNCH.shImagePhashmodule to launch sequencePhashCorrelationmodule to launch sequenceconfigs/core.cfg.sampleThis pull request was created from Copilot chat.
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.