Implement local code graph with DuckDB backend
### Problem to solve The `orbit` CLI can index local repositories and compile queries to SQL, but it cannot execute those queries locally. `orbit index` builds an in-memory `GraphData` and discards it. `orbit query` emits ClickHouse SQL that has nowhere to run without a server. MR !590 built a working proof of concept: a `duckdb-client` crate, a `SqlDialect` enum in the codegen layer, and a `compile_local()` function that skips security/redaction. It indexed the GitLab monolith in 5.7s and ran all 5 query types in 36-54ms. The MR got reviewer feedback that needs to be addressed before merging. This issue tracks the production-ready implementation of local code graph queries. ### Proposed solution Build on MR !590 to deliver a complete local code graph in the `orbit` CLI. The work breaks down into these pieces: **1. Query engine refactor** - Replace `compile_local()` with a `GraphQueryCompilerContext` struct passed to the existing `compile()` function - Add `dialect: SqlDialect` and `local: bool` fields to the context - When `local = true`, skip `enforce_return`, `apply_security_context`, and `check_ast` - Split `codegen.rs` into `codegen/clickhouse.rs` and `codegen/duckdb.rs` with shared helpers - This keeps mature ClickHouse codepaths untouched while DuckDB variations live in their own file **2. DuckDB client crate cleanup** - Reuse `ArrowUtils` helpers from `gkg-utils` instead of the standalone `local_converter.rs` - Handle `_version` column type conversion (Timestamp -> BIGINT) in the conversion layer - Ensure positive node IDs (the hash-based ID scheme can produce negatives, which fail DSL validation) **3. `orbit index` persistence** - After building `GraphData`, call `assign_node_ids()` and convert to Arrow RecordBatches - Write to `~/.orbit/indexes/<repo>/graph.duckdb` via `DuckDbClient::insert_arrow()` - Delete existing data for the project/branch before inserting (full reindex) - Update manifest status **4. `orbit query --local` execution** - Open the DuckDB file for the target repo - Load ontology, compile the query with `SqlDialect::DuckDb` - Execute against DuckDB, format results via GraphFormatter - Support `--format json|pretty|goon` for output format selection **5. File locking** - Cross-process file lock on the DuckDB database file during writes - DuckDB handles intra-process concurrency, but multiple CLI invocations need coordination - Use advisory file locking (`flock` on Unix) on a `.lock` sidecar file **6. Incremental reindexing (stretch)** - The old GKG had file-change diffing via `LadybugChanges` syncer - For v1, full reindex is acceptable (5.7s on the monolith is fast enough) - Incremental support can follow as an optimization ### DuckDB SQL dialect differences (reference) From the design doc and MR !590: | Construct | ClickHouse | DuckDB | |---|---|---| | Parameters | `{pN:Type}` | `$N` (1-indexed) | | `startsWith` | `startsWith` | `starts_with` | | `has` | `has` | `list_contains` | | `array` | `array` | `list_value` | | `arrayConcat` | `arrayConcat` | `list_concat` | | `tuple` | `tuple` | `row` | | `if(a,b,c)` | `if(a,b,c)` | `CASE WHEN a THEN b ELSE c END` | | SET statements | emitted | skipped | | IN with arrays | single array param | element-by-element expansion | | Recursive CTE LIMIT | allowed in body | must be on outer query | | Recursive CTE UNION | multiple branches | exactly one UNION ALL | ### Done criteria - [ ] All 5 query types work against local DuckDB (search, traversal, aggregation, path_finding, neighbors) - [ ] `orbit index` persists graph data to disk - [ ] `orbit query --local` executes queries and returns formatted results - [ ] GraphFormatter produces the same output shape as the production service - [ ] Codegen split into separate dialect files - [ ] `compile()` accepts context struct instead of separate `compile_local()` - [x] File locking prevents concurrent write corruption - [ ] GitLab monolith benchmark: index <10s, queries <100ms - [ ] Existing query-engine tests pass unchanged - [x] New unit tests for DuckDB codegen (side-by-side SQL comparison) - [ ] E2E test: index fixture repo, run all 5 query types ### References - Epic: https://gitlab.com/groups/gitlab-org/-/epics/21406 - DuckDB PoC MR: !590 - Design doc: `docs/design-documents/local_code_graph.md` (branch `michaelangeloio/duckdb-local-queries`) - Old local GKG: https://gitlab.com/gitlab-org/rust/knowledge-graph - Parent GA epic: https://gitlab.com/groups/gitlab-org/-/epics/19744
issue