Skip to content

Polish Ruby language support#574

Merged
buger merged 1 commit into
mainfrom
polish-ruby-language-support
Jun 4, 2026
Merged

Polish Ruby language support#574
buger merged 1 commit into
mainfrom
polish-ruby-language-support

Conversation

@buger

@buger buger commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

Summary

Polishes Ruby support across the CLI symbol/search paths and the LSP daemon tree-sitter path.

Implemented

  • Treat Ruby module, class, method, and singleton_method as first-class symbols.
  • Extract Ruby constant names from tree-sitter nodes so nested modules/classes resolve correctly.
  • Preserve nested Ruby symbols through body_statement wrappers and non-symbol wrapper nodes.
  • Support file#symbol extraction for nested Ruby classes, instance methods, and singleton methods.
  • Add Ruby parser-pool support in lsp-daemon and wire Ruby symbol kind/scope mapping.
  • Add Ruby UID language rules so LSP tree-sitter analysis can generate stable Ruby symbol IDs.
  • Tighten Ruby test detection coverage for test/, spec/, *_test.rb, *_spec.rb, Minitest test_ methods, and RSpec describe/context/it blocks.
  • Fix relative test-directory detection so paths like test/foo.rb are detected, not only paths containing /test/ with a leading parent component.

Tests

  • cargo test ruby --lib
  • cargo test --test ruby_outline_format_tests -- --nocapture
  • cargo test -p lsp-daemon ruby
  • cargo test -p lsp-daemon test_node_kind_to_symbol_kind_mapping --lib
  • Pre-commit hook also passed:
    • cargo fmt --all -- --check
    • cargo clippy --all-targets --all-features -- -D warnings
    • cargo test --lib
    • cargo test --test integration_tests

Dogfood

Tested against a real RuboCop checkout:

  • Path: /tmp/probe-ruby-dogfood/rubocop
  • Remote: https://github.com/rubocop/rubocop.git
  • Commit: 0553884 (2026-06-04 02:11:54 +0900, merge PR #15211)
  • Ruby/Rake/Gemfile-style files: 1,718

Validated:

  • probe symbols -o json /tmp/probe-ruby-dogfood/rubocop/lib/rubocop/cop/base.rb finds nested RuboCop, Cop, Base, documentation_url, and add_offense symbols.
  • probe extract .../base.rb#Base returns the actual class Base body.
  • probe extract .../base.rb#Base.add_offense returns the instance method body.
  • probe extract .../base.rb#Base.documentation_url returns the singleton method body.
  • probe search add_offense /tmp/probe-ruby-dogfood/rubocop -l ruby --max-results 3 -o plain returns Ruby method-context results from RuboCop lib/.
  • probe query -l ruby 'def $NAME' .../base.rb --max-results 3 -o json returns Ruby method matches.
  • Searching RuboCop spec/ for describe returns no results by default and returns RSpec matches with --allow-tests.

Notes

  • probe lsp languages still reports Ruby external LSP as unavailable on this machine because solargraph is not installed. This PR adds the internal tree-sitter/LSP-daemon Ruby parser and symbol plumbing; it does not install or vendor the external Ruby LSP server.

@probelabs

probelabs Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

PR Overview: Polish Ruby Language Support

Summary

This PR significantly enhances Ruby language support across Probe's CLI and LSP daemon by adding first-class symbol handling, improving nested symbol resolution, and tightening test detection. The changes span 10 files with 511 additions and 30 deletions.

Key Changes

1. LSP Daemon Tree-Sitter Integration

File: lsp-daemon/src/analyzer/tree_sitter_analyzer.rs (+110 lines)

  • Added Ruby parser pool support: "ruby" | "rb" => Some(tree_sitter_ruby::LANGUAGE)
  • Implemented map_ruby_node_to_symbol() method mapping Ruby node kinds to SymbolKind:
    • methodSymbolKind::Method
    • singleton_methodSymbolKind::Method
    • classSymbolKind::Class
    • moduleSymbolKind::Module
  • Added Ruby to creates_scope() for proper scope tracking in nested structures
  • Added comprehensive tests: test_ruby_parser_pool_and_node_mapping() and test_ruby_symbol_extraction_uses_parser_pool()

File: lsp-daemon/src/lsp_database_adapter.rs (+31 lines, -5 deletions)

  • Added Ruby tree-sitter language support in get_tree_sitter_language()
  • Extended is_symbol_like() to recognize Ruby symbols: "module" | "method" | "singleton_method"
  • Updated node_kind_to_symbol_kind() to map Ruby node kinds:
    • Added "method" | "singleton_method" to SymbolKind::Method mapping
    • Added "module" to SymbolKind::Module mapping
  • Updated is_function_like() and is_container_like() to handle Ruby node kinds
  • Fixed impl_item mapping from SymbolKind::Class to SymbolKind::Impl
  • Added Ruby-specific test cases in test_node_kind_to_symbol_kind_mapping()

2. Ruby Language Rules for UID Generation

File: lsp-daemon/src/symbol/language_support.rs (+26 lines)

  • Implemented LanguageRules::ruby() with Ruby-specific configuration:
    • scope_separator: "::" (Ruby's namespace separator)
    • anonymous_prefix: "anon"
    • supports_overloading: false
    • case_sensitive: true
    • signature_normalization: RemoveParameterNames
    • file_extensions: vec!["rb".to_string(), "rake".to_string()]
    • signature_keywords: Ruby keywords (def, self, class, module, private, protected, public)

File: lsp-daemon/src/symbol/uid_generator.rs (+3 lines)

  • Registered Ruby language rules: rules.insert("ruby".to_string(), LanguageRules::ruby())

3. Enhanced Symbol Extraction

File: src/extract/symbol_finder.rs (+45 lines)

  • Added "constant" node kind to get_qualified_name() for Ruby constant extraction
  • Added "constant" to find_all_symbol_nodes() identifier detection
  • Added comprehensive test: test_ruby_class_and_nested_method_extraction() validating:
    • Nested class lookup: Base class found within RuboCop::Cop modules
    • Instance method extraction: Base.add_offense returns method body
    • Singleton method extraction: Base.documentation_url returns def self.documentation_url

File: src/extract/symbols.rs (+70 lines, -9 deletions)

  • Added "module" to is_container_node() for Ruby module detection
  • Added "body_statement" to is_container_node() to handle Ruby's body wrapper nodes
  • Refactored collect_symbols() to preserve nested symbols through non-symbol wrapper nodes
  • Updated collect_children_symbols() to handle "body_statement" and other wrapper nodes
  • Added "constant" to extract_symbol_name() for Ruby constant names
  • Extended normalize_kind() to recognize Ruby node kinds:
    • "method" | "singleton_method""method"
  • Added test: test_extract_nested_ruby_symbols() validating full symbol hierarchy extraction

4. Ruby Language Implementation

File: src/language/ruby.rs (+148 lines, -4 deletions)

  • Added helper methods:
    • is_container(): Detects "class" | "module"
    • is_method_like(): Detects "method" | "singleton_method"
    • is_symbol_like(): Combines container and method detection
    • first_line_signature(): Extracts signature from first line
  • Implemented is_symbol_node() override for Ruby
  • Implemented find_parent_function() to walk up AST to find parent method
  • Implemented get_symbol_signature() for Ruby symbols
  • Added comprehensive tests:
    • detects_minitest_style_test_methods(): Validates test_ prefix detection
    • detects_rspec_block_calls(): Validates describe, context, it detection

5. Test Detection Improvements

File: src/language/test_detection.rs (+35 lines, -11 deletions)

  • Fixed relative test directory detection to use path.components() instead of string matching
  • Now correctly detects paths like test/foo.rb (not just project/test/foo.rb)
  • Added tests:
    • detects_ruby_test_file_conventions(): Validates test/, spec/, *_test.rb, *_spec.rb
    • does_not_overmatch_non_test_ruby_files(): Ensures non-test files aren't misclassified

File: src/search/file_list_cache.rs (+42 lines)

  • Added test: test_ruby_test_files_respect_allow_tests() validating:
    • _test.rb and _spec.rb files excluded without --allow-tests
    • Test files included with --allow-tests flag
    • Non-test Ruby files always included

6. Test Fix

File: tests/ruby_outline_format_tests.rs (+1 line, -1 deletion)

  • Fixed test filename from keyword_test.rb to keyword_highlighting.rb to avoid false test detection

Architecture & Impact

Component Relationships

graph TD
    A[Ruby Source Code] --> B[tree-sitter-ruby Parser]
    B --> C[ParserPool lsp-daemon]
    B --> D[ParserPool CLI]
    C --> E[TreeSitterAnalyzer]
    E --> F[map_ruby_node_to_symbol]
    E --> G[creates_scope ruby]
    C --> H[LspDatabaseAdapter]
    H --> I[node_kind_to_symbol_kind]
    H --> J[is_symbol_like ruby]
    D --> K[SymbolFinder]
    K --> L[find_symbol_in_file]
    K --> M[get_qualified_name constant]
    D --> N[SymbolExtractor]
    N --> O[collect_symbols body_statement]
    N --> P[extract_symbol_name constant]
    Q[LanguageRules ruby] --> R[SymbolUIDGenerator]
    R --> S[Stable Ruby Symbol IDs]
    T[RubyLanguage impl] --> U[is_test_node Minitest/RSpec]
    T --> V[find_parent_function]
    T --> W[get_symbol_signature]
    X[test_detection] --> Y[is_test_file components]
Loading

Data Flow: Nested Ruby Symbol Extraction

sequenceDiagram
    participant CLI as probe extract
    participant SF as SymbolFinder
    participant AL as RubyLanguage
    participant TS as tree-sitter-ruby
    participant SS as SymbolExtractor
    
    CLI->>SF: find_symbol_in_file("Base.add_offense")
    SF->>TS: parse Ruby code
    TS-->>SF: AST with module/class/method nodes
    SF->>AL: is_acceptable_parent(method node)
    AL-->>SF: true (method is acceptable)
    SF->>SF: Traverse nested: RuboCop → Cop → Base → add_offense
    SF->>SS: extract_symbols()
    SS->>SS: collect_symbols() through body_statement
    SS->>SS: extract_symbol_name() using constant node
    SS-->>SF: Symbol hierarchy with Base.add_offense
    SF-->>CLI: Method node with code body
Loading

Scope Discovery & Context Expansion

Affected Modules

  1. LSP Daemon Analysis Pipeline

    • Tree-sitter parser pool now includes Ruby
    • Symbol kind mapping handles Ruby-specific node types
    • Scope creation tracks Ruby class/module boundaries
  2. Symbol UID Generation

    • Ruby now has stable, language-specific UID rules
    • Uses :: scope separator for proper namespacing
    • Handles anonymous symbols (procs/lambdas) with position-based UIDs
  3. CLI Symbol Extraction

    • Nested symbol resolution works for Class.method syntax
    • Constant extraction enables proper nested module/class resolution
    • Body statement handling preserves symbols through wrapper nodes
  4. Test Detection

    • File-level: test/, spec/, *_test.rb, *_spec.rb
    • AST-level: Minitest test_ methods, RSpec describe/it/context blocks
    • Relative path handling fixed for edge cases

Related Files (Not Modified)

Based on the search analysis, these files are related but not changed in this PR:

  • lsp-daemon/src/language_detector.rs - Ruby language detection (already supported)
  • lsp-daemon/src/fqn.rs - Ruby scope separator :: (already defined)
  • src/language/factory.rs - Ruby language factory registration (already exists)
  • src/search/result_ranking.rs - Singleton method relevance boost (already implemented)
  • probe/Cargo.toml - tree-sitter-ruby dependency (already declared)
  • lsp-daemon/Cargo.toml - tree-sitter-ruby dependency (already declared)

Testing Coverage

The PR includes comprehensive tests:

  1. Parser Pool Tests: Validates Ruby parser availability and node mapping
  2. Symbol Extraction Tests: Nested class/module/method extraction
  3. Test Detection Tests: Minitest and RSpec pattern recognition
  4. File List Tests: --allow-tests flag behavior
  5. Integration Tests: Real-world RuboCop codebase validation

Validation

The author tested against a real RuboCop checkout (1,718 Ruby files) and validated:

  • probe symbols finds nested symbols correctly
  • probe extract with file#symbol syntax works for nested classes and methods
  • probe search returns Ruby method-context results
  • probe query matches Ruby method definitions
  • Test file filtering works with --allow-tests

References

Modified Files

  • lsp-daemon/src/analyzer/tree_sitter_analyzer.rs:104-107,454-457,612-627,947-952,1336-1457
  • lsp-daemon/src/lsp_database_adapter.rs:838-843,975-978,1269-1286,2500-2503,2545-2548,3262-3310
  • lsp-daemon/src/symbol/language_support.rs:165-193,597-619
  • lsp-daemon/src/symbol/uid_generator.rs:94-97
  • src/extract/symbol_finder.rs:25-28,85-88,835-881
  • src/extract/symbols.rs:80-102,149-151,190-217,275-278,308-321,637-694
  • src/language/ruby.rs:10-217
  • src/language/test_detection.rs:137-172,177-210
  • src/search/file_list_cache.rs:918-961
  • tests/ruby_outline_format_tests.rs:941
Metadata
  • Review Effort: 3 / 5
  • Primary Label: enhancement

Powered by Visor from Probelabs

Last updated: 2026-06-04T12:57:39.654Z | Triggered by: pr_opened | Commit: 4be42ab

💡 TIP: You can chat with Visor using /visor ask <your question>

@probelabs

probelabs Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Security Issues (2)

Severity Location Issue
🟡 Warning src/extract/symbols.rs:154
The change from specific node type matching to generic 'child.child_count() > 0' for recursive symbol collection could introduce security risks. This allows traversal of ANY non-leaf node in the AST, potentially exposing the system to maliciously crafted code with unusual tree structures that could cause excessive recursion or memory consumption.
💡 SuggestionRevert to explicit node type matching or add additional validation to ensure only known safe node types are traversed recursively. The current implementation could be exploited by specially crafted source files with unusual AST structures.
🟡 Warning src/language/test_detection.rs:153
The change from string-based path.contains() to path.components().any() improves security by properly handling path components, but the implementation could still be vulnerable to path traversal attempts through specially crafted directory names. The current implementation checks if ANY component matches test directory names, which could be exploited if an attacker can create directories with names like 'test' or 'spec' in unexpected locations.
💡 SuggestionAdd validation to ensure test directory detection only applies to well-known test directory locations and not arbitrary directory names that could be exploited for path traversal attacks. Consider adding depth limits or whitelisting of acceptable test directory patterns.

Architecture Issues (1)

Severity Location Issue
🟠 Error contract:0
Output schema validation failed: must have required property 'issues'

Quality Issues (9)

Severity Location Issue
🟡 Warning src/language/test_detection.rs:154
Test coverage lacks negative test cases for Ruby test file detection. While positive test cases exist (test/user_service.rb, *_test.rb, *_spec.rb), there are no assertions validating that non-test files like 'keyword_highlighting.rb', 'contest.rb', or 'latest.rb' are correctly identified as non-test files. This could miss false positive bugs.
💡 SuggestionAdd negative test cases with assertions using assert!() to validate non-test Ruby files are correctly rejected. The test suite should include files that should NOT be detected as test files.
🟡 Warning src/language/ruby.rs:35
The is_test_node method uses unwrap_or("") on utf8_text() which silently ignores encoding errors. If UTF-8 parsing fails, the method continues with an empty string, potentially leading to incorrect test detection or missed test nodes.
💡 SuggestionHandle UTF-8 parsing errors explicitly. Return false when encoding fails rather than silently continuing with an empty string, or log a debug message for troubleshooting.
🟡 Warning src/extract/symbols.rs:154
Changed from explicit node kind matching to generic child.child_count() > 0 check. This could process unintended node types (ERROR nodes, comments, etc.) that happen to have children, potentially introducing unexpected behavior or performance overhead.
💡 SuggestionKeep the explicit node kind matching for known container types, or add a filter to exclude ERROR nodes and other non-symbol nodes. The generic approach may be too permissive.
🟡 Warning src/extract/symbols.rs:193
Added early return optimization that skips processing if symbols.is_empty(). This could cause issues if a valid body node contains no symbols but should still be traversed for nested structures. The early return may prevent discovery of symbols in deeper nested nodes.
💡 SuggestionVerify this optimization doesn't break symbol extraction for nested structures. Consider whether an empty body should still be traversed, or add tests validating nested symbol discovery still works correctly.
🟡 Warning src/extract/symbols.rs:218
Added early return optimization that skips processing if symbols.is_empty(). Similar to the body processing change, this could prevent discovery of symbols in nested structures if intermediate nodes return empty results.
💡 SuggestionEnsure this optimization doesn't break symbol extraction for deeply nested Ruby classes/modules. Add tests validating that symbols at multiple nesting levels are still discovered correctly.
🟡 Warning src/language/ruby.rs:149
Test validates implementation details (checking for 'true' value) rather than behavior. The test 'assert true' doesn't validate the actual method behavior - it would pass even if the method logic was incorrect but still returned 'true'. The test should verify the method actually performs authentication logic.
💡 SuggestionRestructure test to validate actual behavior: test that the method performs meaningful authentication checks, not just that it returns 'true'. Consider testing with different input scenarios that should produce different outcomes.
🟡 Warning src/language/ruby.rs:192
Test uses 'expect(true).to eq(true)' which validates implementation details rather than behavior. This assertion would pass even if the test logic was completely broken but still evaluated to 'true'. The test should verify the RSpec block actually sets up a test context.
💡 SuggestionRestructure test to validate actual RSpec behavior: verify the 'it' block contains test assertions or setup code, not just that it evaluates to 'true'. Consider testing that the block is properly structured as a test.
🟡 Warning src/extract/symbol_finder.rs:868
Test assertion checks for 'current_offenses' string in method body without explaining why this specific value should be present. The test validates implementation details rather than behavior - it would pass if the method contained any string matching 'current_offenses' regardless of actual method logic.
💡 SuggestionAdd a comment explaining why 'current_offenses' should appear in the method body, or restructure the test to validate that the method actually performs offense tracking behavior rather than just containing a specific string.
🟡 Warning src/extract/symbols.rs:654
Test uses hardcoded method names 'documentation_url' and 'add_offense' in assertions without explaining why these specific methods should exist. The test validates implementation details rather than behavior - it checks for specific method names rather than validating that the class has both singleton and instance methods.
💡 SuggestionAdd comments explaining the expected class structure and why these specific methods are important, or restructure to validate that the class has both singleton and instance methods without hardcoding specific names.

Powered by Visor from Probelabs

Last updated: 2026-06-04T12:39:18.419Z | Triggered by: pr_opened | Commit: 4be42ab

💡 TIP: You can chat with Visor using /visor ask <your question>

@buger buger merged commit 1d5be4a into main Jun 4, 2026
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant