fix(core): prefer current repo when resolving amber home path#4263
fix(core): prefer current repo when resolving amber home path#4263carloea2 wants to merge 3 commits intoapache:mainfrom
Conversation
|
@chenlica Can you assign a reviewer? Thanks. |
|
@aglinxinyuan @Xiao-zhen-Liu Please review this PR. |
There was a problem hiding this comment.
Pull request overview
Improves Utils.amberHomePath resolution to be more deterministic in environments where multiple sibling repositories may contain an amber/ directory, reducing the chance of selecting the wrong repo when launched from a non-amber working directory.
Changes:
- Adds a two-pass search strategy to prefer a “closest” match before falling back to any match.
- Adds safeguards for filesystem-root cases and normalizes returned paths via
toRealPath(). - Ensances resource safety by closing
Files.walk(...)viaUsing.resource(...).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| val closestPrefix: Option[Path] = | ||
| Using.resource(Files.walk(parent, 2)) { stream => | ||
| stream | ||
| .iterator() | ||
| .asScala | ||
| .filter(path => isAmberHomePath(path)) | ||
| .map(_.toRealPath()) // normalize after filtering | ||
| .filter(path => path.startsWith(currentWorkingDirectory)) | ||
| .maxByOption(_.getNameCount) // deepest prefix = closest ancestor | ||
| } | ||
|
|
||
| closestPrefix.getOrElse { | ||
| // Pass 2: fallback to any valid match | ||
| val anyMatch = | ||
| Using.resource(Files.walk(parent, 2)) { stream => | ||
| stream | ||
| .filter((path: Path) => isAmberHomePath(path)) | ||
| .findAny() | ||
| } |
There was a problem hiding this comment.
Files.walk(parent, 2) is executed twice (pass 1 and pass 2). Since directory walking can be relatively expensive (and can observe different FS state between passes), consider walking once, collecting the matching candidates, and then applying the preferred-selection logic + fallback in-memory.
| val parent = Option(currentWorkingDirectory.getParent).getOrElse { | ||
| throw new RuntimeException( | ||
| "Finding texera home path failed. Current working directory is " + currentWorkingDirectory | ||
| s"Cannot search for texera home from filesystem root: $currentWorkingDirectory" | ||
| ) |
There was a problem hiding this comment.
The new exception message refers to "texera home" even though this method is computing amberHomePath and the scaladoc says "amber home directory". Aligning the wording (e.g., "Amber home" or "Texera/Amber home") would make this error easier to understand when it appears in logs.
| import java.nio.file.{Files, Path, Paths} | ||
| import scala.jdk.CollectionConverters._ | ||
| import scala.util.Using |
There was a problem hiding this comment.
These new imports are placed inside the Utils object, while other files in this package keep imports at the top of the file. Consider moving them to the top-level import section for consistency and to make dependencies easier to spot.
| lazy val amberHomePath: Path = { | ||
| val currentWorkingDirectory = Paths.get(".").toRealPath() | ||
| // check if the current directory is the amber home path | ||
|
|
||
| if (isAmberHomePath(currentWorkingDirectory)) { | ||
| currentWorkingDirectory | ||
| } else { | ||
| // from current path's parent directory, search its children to find amber home path | ||
| // current max depth is set to 2 (current path's siblings and direct children) | ||
| val searchChildren = Files | ||
| .walk(currentWorkingDirectory.getParent, 2) | ||
| .filter((path: Path) => isAmberHomePath(path)) | ||
| .findAny | ||
| if (searchChildren.isPresent) { | ||
| searchChildren.get | ||
| } else { | ||
| val parent = Option(currentWorkingDirectory.getParent).getOrElse { | ||
| throw new RuntimeException( | ||
| "Finding texera home path failed. Current working directory is " + currentWorkingDirectory | ||
| s"Cannot search for texera home from filesystem root: $currentWorkingDirectory" | ||
| ) | ||
| } | ||
|
|
||
| // Pass 1: prefer the closest prefix (deepest ancestor) of currentWorkingDirectory | ||
| val closestPrefix: Option[Path] = | ||
| Using.resource(Files.walk(parent, 2)) { stream => | ||
| stream | ||
| .iterator() | ||
| .asScala | ||
| .filter(path => isAmberHomePath(path)) | ||
| .map(_.toRealPath()) // normalize after filtering | ||
| .filter(path => path.startsWith(currentWorkingDirectory)) | ||
| .maxByOption(_.getNameCount) // deepest prefix = closest ancestor | ||
| } | ||
|
|
||
| closestPrefix.getOrElse { | ||
| // Pass 2: fallback to any valid match | ||
| val anyMatch = | ||
| Using.resource(Files.walk(parent, 2)) { stream => | ||
| stream | ||
| .filter((path: Path) => isAmberHomePath(path)) | ||
| .findAny() | ||
| } | ||
|
|
||
| if (anyMatch.isPresent) { | ||
| anyMatch.get().toRealPath() | ||
| } else { | ||
| throw new RuntimeException( | ||
| s"Finding texera home path failed. Current working directory is $currentWorkingDirectory" | ||
| ) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
This change alters path-resolution behavior in a way that’s easy to regress (especially with multiple sibling repos). Since the amber module already has ScalaTest coverage, adding a focused unit test that creates a temp directory tree and sets user.dir to validate the selection logic would help prevent future nondeterministic failures.
| // Pass 1: prefer the closest prefix (deepest ancestor) of currentWorkingDirectory | ||
| val closestPrefix: Option[Path] = | ||
| Using.resource(Files.walk(parent, 2)) { stream => | ||
| stream | ||
| .iterator() | ||
| .asScala | ||
| .filter(path => isAmberHomePath(path)) | ||
| .map(_.toRealPath()) // normalize after filtering | ||
| .filter(path => path.startsWith(currentWorkingDirectory)) | ||
| .maxByOption(_.getNameCount) // deepest prefix = closest ancestor | ||
| } |
There was a problem hiding this comment.
The comment/variable name implies selecting the closest prefix/ancestor of currentWorkingDirectory, but the predicate uses path.startsWith(currentWorkingDirectory) which instead selects matches under the working directory. If the intent is deepest ancestor (e.g., when CWD is inside an amber/... subtree), the startsWith direction should be reversed (or handle both ancestor/descendant cases explicitly) to avoid selecting the wrong candidate in edge cases with multiple matches.
|
Just curious, who requested copilot review? |
I requested it. Please address the comments from Copilot before I have a pass. |
What changes were proposed in this PR?
This PR fixes an issue in
amberHomePathresolution where the previous implementation could select the wrong repo when multiple sibling directories under the same parent satisfiedisAmberHomePath(...).texera/amber/src/main/scala/org/apache/texera/amber/engine/common/Utils.scala
Lines 39 to 59 in ac909a0
Previously, if the current working directory was not itself recognized as the Amber home path, the code would walk the parent directory and return
findAny()from matching paths. In environments with multiple Texera/Amber repos under the same parent, this could resolve to an unrelated sibling repo instead of the repo the process was launched from.This PR changes the logic to:
toRealPath()Files.walk(...)safely usingUsing.resource(...)This makes Amber home path detection more deterministic and prevents accidentally selecting the wrong engine/repo in multi-repo local setups.
Any related issues, documentation, discussions?
Closes #4262
How was this PR tested?
Manually tested with a local directory layout where multiple sibling repos satisfy
isAmberHomePath(...).Example setup:
Was this PR authored or co-authored using generative AI tooling?
No.