Conversation
GiGL Automation@ 20:29:04UTC : Starting to build base images for CUDA and CPU. |
GiGL Automation@ 20:42:54UTC : Starting to build base images for CUDA and CPU. |
GiGL Automation@ 20:48:49UTC : Starting to build base images for CUDA and CPU. |
GiGL Automation@ 20:58:39UTC : Built and pushed new images:
Updated |
|
/unit_test |
GiGL Automation@ 21:04:18UTC : 🔄 @ 21:14:07UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 21:04:20UTC : 🔄 @ 22:01:34UTC : ✅ Workflow completed successfully. |
|
/unit_test |
|
/e2e_test |
GiGL Automation@ 21:11:42UTC : 🔄 @ 22:19:40UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 21:11:44UTC : 🔄 @ 21:19:59UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 21:11:48UTC : 🔄 @ 21:21:24UTC : ❌ Workflow failed. |
|
/e2e_test |
GiGL Automation@ 21:43:04UTC : 🔄 @ 23:16:09UTC : ✅ Workflow completed successfully. |
kmontemayor2-sc
left a comment
There was a problem hiding this comment.
Thanks for all the excellent work Matt!
I think the only thing left blocking on my side is adding the cpp test to our merge suite https://github.com/Snapchat/GiGL/pull/558/changes#r3118534359
|
|
||
| COPY CMakeLists.txt CMakeLists.txt | ||
| COPY MANIFEST.in MANIFEST.in | ||
| COPY README.md README.md |
There was a problem hiding this comment.
QQ any reason we copy this over? If we need it, can we add a comment saying why?
| | Priority | Pattern | Group | | ||
| | -------- | ------------------------------------ | ------------------------------------- | | ||
| | 1 | `.*` | Local project headers (first) | | ||
| | 2 | `^"(llvm\|llvm-c\|clang\|clang-c)/"` | LLVM/Clang internal headers | |
There was a problem hiding this comment.
hmmm, this seems very specific to projects relying on LLVM / clang internals, I don't think we're really going to use those internals? I know we'll use the tools, but do we forsee needing their imports?
| | Type template parameters | `CamelCase` | `NodeType` | | ||
| | Functions, methods | `camelBack` | `sampleNeighbors` | | ||
| | Variables, parameters, members | `camelBack` | `numNodes` | | ||
| | Private/protected members | `camelBack` with `_` prefix | `_nodeFeatures` | |
There was a problem hiding this comment.
Hmmm, I feel like being able to more easily differentiate between MyClass and myFunction my be useful (and is more inline with our Python code base). But I wont push back here anymore,
|
|
||
| | Identifier kind | Convention | Example | | ||
| | --------------------------------------------------------- | --------------------------- | ----------------- | | ||
| | Classes, enums, unions | `CamelCase` | `DistDataset` | |
There was a problem hiding this comment.
You know I was always under the impression first cap is PascalCase, and camelCase doesn't have an initialCapital but then wiki says it doesn't really matter. https://en.wikipedia.org/wiki/Camel_case#Description
Though since we use both (for different things) maybe we should differentiate?
| ## Formatting (`.clang-format`) | ||
|
|
||
| The style is based on LLVM with the following notable deviations: | ||
|
|
||
| ### Line length | ||
|
|
||
| ``` | ||
| ColumnLimit: 120 | ||
| ``` | ||
|
|
||
| 120 columns rather than the LLVM default of 80. ML and graph code tends to have longer identifiers and nested template | ||
| types; 120 gives enough room without forcing awkward wraps. | ||
|
|
||
| ### Indentation and braces | ||
|
|
||
| ``` | ||
| IndentWidth: 4 | ||
| BreakBeforeBraces: Attach # K&R / "same-line" style | ||
| UseTab: Never | ||
| IndentCaseLabels: true # case labels indented inside switch | ||
| NamespaceIndentation: None # namespace bodies not indented | ||
| ``` | ||
|
|
||
| ### Pointer and reference alignment | ||
|
|
||
| ``` | ||
| PointerAlignment: Left | ||
| ``` | ||
|
|
||
| Pointers bind to the type, not the name: `int* x`, not `int *x`. | ||
|
|
||
| ### Parameter and argument wrapping | ||
|
|
||
| ``` | ||
| BinPackArguments: false | ||
| BinPackParameters: false | ||
| ``` | ||
|
|
||
| When a function call or declaration doesn't fit on one line, every argument/parameter gets its own line. Mixed | ||
| "bin-packing" (some on one line, some wrapped) is not allowed. | ||
|
|
||
| ### Templates | ||
|
|
||
| ``` | ||
| AlwaysBreakTemplateDeclarations: true | ||
| ``` | ||
|
|
||
| `template <...>` always appears on its own line, keeping the declaration signature visually separate from the template | ||
| header. | ||
|
|
||
| ### Include ordering | ||
|
|
||
| Includes are sorted and split into three priority groups: | ||
|
|
||
| | Priority | Pattern | Group | | ||
| | -------- | ------------------------------------ | ------------------------------------- | | ||
| | 1 | `.*` | Local project headers (first) | | ||
| | 2 | `^"(llvm\|llvm-c\|clang\|clang-c)/"` | LLVM/Clang internal headers | | ||
| | 3 | `^(<\|"(gtest\|isl\|json)/)` | System and third-party headers (last) | | ||
|
|
||
| ### Raw string formatting | ||
|
|
||
| Raw string literals with the `pb` delimiter (e.g. `R"pb(...)pb"`) are formatted as TextProto using Google style, | ||
| matching the protobuf idiom used throughout the codebase. | ||
|
|
||
| ______________________________________________________________________ | ||
|
|
||
| ## Static Analysis (`.clang-tidy`) | ||
|
|
||
| ### Check philosophy | ||
|
|
||
| A broad set of check families is enabled to catch bugs, enforce modern C++ idioms, and maintain readability. All | ||
| warnings are errors — there is no "warning-only" category. | ||
|
|
||
| Enabled families: | ||
|
|
||
| | Family | What it covers | | ||
| | --------------------------- | -------------------------------------------------------------------------------------------------------------------------------------- | | ||
| | `boost-use-to-string` | Prefer `std::to_string` over `boost::lexical_cast` for numeric conversions | | ||
| | `bugprone-*` | Common programming mistakes: dangling handles, suspicious string construction, assert side effects, etc. | | ||
| | `cert-*` | CERT secure coding rules for error handling (`err34-c`), floating-point loops (`flp30-c`), and RNG seeding (`msc32-c`, `msc50/51-cpp`) | | ||
| | `clang-diagnostic-*` | Compiler diagnostic warnings surfaced as lint checks (e.g. `-Wall`, `-Wextra` violations) | | ||
| | `cppcoreguidelines-*` | C++ Core Guidelines: no raw `malloc`, no union member access, no object slicing, safe downcasts | | ||
| | `google-*` | Google C++ style: explicit constructors, no global names in headers, safe `memset` usage | | ||
| | `hicpp-exception-baseclass` | All thrown exceptions must derive from `std::exception` | | ||
| | `misc-*` | Miscellaneous: header-only definitions, suspicious enum usage, throw-by-value/catch-by-reference, etc. | | ||
| | `modernize-*` | Modernize to C++11/14/17: `nullptr`, range-based for, `make_unique`, `using` aliases, etc. | | ||
| | `performance-*` | Unnecessary copies, inefficient string ops, missed `emplace`, type promotions in math functions | | ||
| | `readability-*` | Naming conventions, braces around statements, boolean simplification, function size limits | | ||
|
|
||
| ### Disabled checks | ||
|
|
||
| Some checks in the above families are disabled where they produce excessive noise or conflict with common patterns in | ||
| this codebase: | ||
|
|
||
| | Disabled check | Reason | | ||
| | ----------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | ||
| | `bugprone-easily-swappable-parameters` | Tensor and sampler APIs legitimately have many adjacent same-typed parameters | | ||
| | `bugprone-implicit-widening-of-multiplication-result` | Crashes clang-tidy 15 on a construct in `ATen/core/dynamic_type.h` (upstream LLVM bug). Re-enable when upgrading past clang-tidy 15. | | ||
| | `bugprone-narrowing-conversions` | Too noisy in ML code mixing `int`/`int64_t`/`size_t` for tensor dimensions | | ||
| | `misc-confusable-identifiers` | Performs an O(n²) comparison of all identifiers in scope to detect Unicode homoglyphs. PyTorch headers introduce thousands of identifiers, making this check account for ~70% of total lint time. All identifiers in this codebase are standard ASCII. | | ||
| | `misc-const-correctness` | Produces false positives with pybind11 types whose mutation happens through `operator[]` (which is non-const). The check incorrectly suggests `const` on variables that are mutated. | | ||
| | `misc-no-recursion` | Recursive graph algorithms are intentional | | ||
| | `modernize-avoid-c-arrays` | C arrays are needed for pybind11 and C-interop code | | ||
| | `modernize-use-trailing-return-type` | Trailing return types (`auto f() -> T`) are only useful when the return type depends on template params. Requiring them everywhere is non-standard and reduces readability. | | ||
| | `readability-avoid-const-params-in-decls` | Incorrectly fires on `const T&` parameters in multi-line declarations (clang-tidy 15 bug). The check is meant for top-level const on by-value params, which is a separate, valid concern. | | ||
| | `readability-container-contains` | `.contains()` requires C++20; the codebase builds with C++17 | | ||
| | `readability-identifier-length` | Short loop variables (`i`, `j`, `k`) are idiomatic | | ||
| | `readability-function-cognitive-complexity` | Algorithmic code often requires nesting that is inherent to the problem structure. Enforcing an arbitrary complexity ceiling discourages clarity and encourages artificial decomposition. | | ||
| | `readability-magic-numbers` | Literal constants are common in ML code (e.g. feature dimensions) | | ||
|
|
||
| ### Naming conventions | ||
|
|
||
| Enforced via `readability-identifier-naming`: | ||
|
|
||
| | Identifier kind | Convention | Example | | ||
| | --------------------------------------------------------- | --------------------------- | ----------------- | | ||
| | Classes, enums, unions | `CamelCase` | `DistDataset` | | ||
| | Type template parameters | `CamelCase` | `NodeType` | | ||
| | Functions, methods | `camelBack` | `sampleNeighbors` | | ||
| | Variables, parameters, members | `camelBack` | `numNodes` | | ||
| | Private/protected members | `camelBack` with `_` prefix | `_nodeFeatures` | | ||
| | Constants (`constexpr`, `const` globals, class constants) | `CamelCase` with `k` prefix | `kMaxBatchSize` | | ||
|
|
||
| ### Key option tuning | ||
|
|
||
| | Option | Value | Effect | | ||
| | ---------------------------------------------------------- | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||
| | `WarningsAsErrors` | `*` | Every check failure is a hard error in CI | | ||
| | `HeaderFilterRegex` | `.*/gigl/csrc/.*` | Scopes checks to our own headers. Using `.*` causes clang-tidy to report warnings from every PyTorch/pybind11 header it parses, flooding output with thousands of third-party issues. | | ||
| | `FormatStyle` | `none` | clang-tidy does not auto-reformat; use clang-format separately | | ||
| | `bugprone-string-constructor.LargeLengthThreshold` | `8388608` (8 MB) | Strings larger than 8 MB from a length argument are flagged | | ||
| | `modernize-loop-convert.NamingStyle` | `CamelCase` | Auto-generated loop variable names use CamelCase | | ||
| | `readability-function-size.LineThreshold` | `1000` | Functions over 1000 lines are flagged | | ||
| | `readability-braces-around-statements.ShortStatementLines` | `0` | Braces required for all control-flow bodies, even single-line | |
There was a problem hiding this comment.
No strong preference, this is fine :)
|
|
||
| # Not part of `make format`: clang-tidy --fix rewrites logic (renames identifiers, | ||
| # changes expressions, adds/removes keywords), not just style. Run manually and | ||
| # review the diff before committing. |
There was a problem hiding this comment.
this can't fix everything right? Should we note that this is not guaranteed to fix everything?
| generate_compile_commands: | ||
| uv run python -m scripts.generate_compile_commands | ||
|
|
||
| check_lint_cpp: |
There was a problem hiding this comment.
I more ment like, "Typing make format_check does not also check the cpp linter" but that's a small nit.
There was a problem hiding this comment.
Let's also add this to the merge workflow? https://github.com/Snapchat/GiGL/blob/main/.github/workflows/on-pr-merge.yml
| jobs: | ||
| build: | ||
| name: Build and release pip whl | ||
| name: Build and release pip whl (${{ matrix.torch-variant }}) |
There was a problem hiding this comment.
Do we need to update our install guidance for the correct name now?
|
|
||
| _REPO_ROOT: Path = Path(__file__).resolve().parent.parent | ||
| _CMAKE_BUILD_DIR: Path = _REPO_ROOT / ".cache" / "cmake_build_lint" | ||
| _COMPILE_COMMANDS: Path = _REPO_ROOT / ".cache" / "compile_commands.json" |
There was a problem hiding this comment.
should this be PUBLIC if we import it later?
Scope of work done
C++ Infrastructure for GiGL
This PR adds the foundational C++ infrastructure to GiGL — everything needed to write, build, lint, and test C++ code alongside the existing Python codebase.
What's included
Build system — Switches the build backend from setuptools to scikit-build-core + CMake.
CMakeLists.txtat the repo root auto-discovers and builds allpython_*.cpp(andpython_*.cu) files undergigl/csrc/as pybind11 extension modules. The build is triggered automatically bymake build_cpp_extensions(uv pip install -e . --no-build-isolation) and is wired intomake unit_test_py,make integration_test,make install_dev_deps, and pipeline compilation targets so it runs without a separate manual step.Release pipeline —
release.ymlis updated to build and publish two wheels — one against CPU torch headers and one against CUDA 12.8 torch headers — using a matrix build. Both wheels coexist in the registry index, differentiated by a build tag (1cpu/1cu128).Formatting and linting —
.clang-formatand.clang-tidyconfigs establish C++ style conventions.make format_cppformats in-place.make check_lint_cppruns fast static analysis usingclangd --check(roughly 10× faster than clang-tidy due to preamble caching).make fix_lint_cppapplies auto-fixable violations viaclang-tidy --fixand is intentionally separate frommake formatsince it rewrites logic, not just style.check_lint_cppis included inlint_test; it is not part oftype_check(Python/mypy only).Dependency installation —
requirements/install_cpp_deps.shinstallsclang-format-15,clang-tidy-15,clangd-15,clang++-15,libstdc++-12-dev, andcmakeon Linux.clang++-15andlibstdc++-12-devare required so thatgenerate_compile_commands.pycan rewritecompile_commands.jsonto use clang natively and clangd can resolve standard headers without a--query-driverworkaround.Compile commands generation —
scripts/generate_compile_commands.pygenerates.cache/compile_commands.jsonfor clangd by running CMake in a dedicated lint build directory (.cache/cmake_build_lint, separate from scikit-build-core's.cache/cmake_buildto avoid generator conflicts). It post-processes the output to replace the compiler withclang++-15. Called automatically byrun_cpp_lint.pybefore each lint run, and directly viamake generate_compile_commandsfor IDE/clangd setup.Linting script —
scripts/run_cpp_lint.pywrapsclangd --check, runs checks in parallel, and produces clean output: a green pass message or a per-file list of failures. All clangd noise is filtered.Documentation —
docs/cpp_style_guide.mddocuments the formatting and linting rules, how to run them, and rationale for disabled checks.Unit test scaffolding —
tests/unit/cpp/with a standaloneCMakeListsand a smoke test to verify the infrastructure works end to end. Run viamake unit_test_cpp; included inmake unit_test.Where is the documentation for this feature?: N/A
Did you add automated tests or write a test plan?
Updated Changelog.md? NO
Ready for code review?: NO