Implement local code graph with DuckDB backend
### Problem to solve
The `orbit` CLI can index local repositories and compile queries to SQL, but it cannot execute those queries locally. `orbit index` builds an in-memory `GraphData` and discards it. `orbit query` emits ClickHouse SQL that has nowhere to run without a server.
MR !590 built a working proof of concept: a `duckdb-client` crate, a `SqlDialect` enum in the codegen layer, and a `compile_local()` function that skips security/redaction. It indexed the GitLab monolith in 5.7s and ran all 5 query types in 36-54ms. The MR got reviewer feedback that needs to be addressed before merging.
This issue tracks the production-ready implementation of local code graph queries.
### Proposed solution
Build on MR !590 to deliver a complete local code graph in the `orbit` CLI. The work breaks down into these pieces:
**1. Query engine refactor**
- Replace `compile_local()` with a `GraphQueryCompilerContext` struct passed to the existing `compile()` function
- Add `dialect: SqlDialect` and `local: bool` fields to the context
- When `local = true`, skip `enforce_return`, `apply_security_context`, and `check_ast`
- Split `codegen.rs` into `codegen/clickhouse.rs` and `codegen/duckdb.rs` with shared helpers
- This keeps mature ClickHouse codepaths untouched while DuckDB variations live in their own file
**2. DuckDB client crate cleanup**
- Reuse `ArrowUtils` helpers from `gkg-utils` instead of the standalone `local_converter.rs`
- Handle `_version` column type conversion (Timestamp -> BIGINT) in the conversion layer
- Ensure positive node IDs (the hash-based ID scheme can produce negatives, which fail DSL validation)
**3. `orbit index` persistence**
- After building `GraphData`, call `assign_node_ids()` and convert to Arrow RecordBatches
- Write to `~/.orbit/indexes/<repo>/graph.duckdb` via `DuckDbClient::insert_arrow()`
- Delete existing data for the project/branch before inserting (full reindex)
- Update manifest status
**4. `orbit query --local` execution**
- Open the DuckDB file for the target repo
- Load ontology, compile the query with `SqlDialect::DuckDb`
- Execute against DuckDB, format results via GraphFormatter
- Support `--format json|pretty|goon` for output format selection
**5. File locking**
- Cross-process file lock on the DuckDB database file during writes
- DuckDB handles intra-process concurrency, but multiple CLI invocations need coordination
- Use advisory file locking (`flock` on Unix) on a `.lock` sidecar file
**6. Incremental reindexing (stretch)**
- The old GKG had file-change diffing via `LadybugChanges` syncer
- For v1, full reindex is acceptable (5.7s on the monolith is fast enough)
- Incremental support can follow as an optimization
### DuckDB SQL dialect differences (reference)
From the design doc and MR !590:
| Construct | ClickHouse | DuckDB |
|---|---|---|
| Parameters | `{pN:Type}` | `$N` (1-indexed) |
| `startsWith` | `startsWith` | `starts_with` |
| `has` | `has` | `list_contains` |
| `array` | `array` | `list_value` |
| `arrayConcat` | `arrayConcat` | `list_concat` |
| `tuple` | `tuple` | `row` |
| `if(a,b,c)` | `if(a,b,c)` | `CASE WHEN a THEN b ELSE c END` |
| SET statements | emitted | skipped |
| IN with arrays | single array param | element-by-element expansion |
| Recursive CTE LIMIT | allowed in body | must be on outer query |
| Recursive CTE UNION | multiple branches | exactly one UNION ALL |
### Done criteria
- [ ] All 5 query types work against local DuckDB (search, traversal, aggregation, path_finding, neighbors)
- [ ] `orbit index` persists graph data to disk
- [ ] `orbit query --local` executes queries and returns formatted results
- [ ] GraphFormatter produces the same output shape as the production service
- [ ] Codegen split into separate dialect files
- [ ] `compile()` accepts context struct instead of separate `compile_local()`
- [x] File locking prevents concurrent write corruption
- [ ] GitLab monolith benchmark: index <10s, queries <100ms
- [ ] Existing query-engine tests pass unchanged
- [x] New unit tests for DuckDB codegen (side-by-side SQL comparison)
- [ ] E2E test: index fixture repo, run all 5 query types
### References
- Epic: https://gitlab.com/groups/gitlab-org/-/epics/21406
- DuckDB PoC MR: !590
- Design doc: `docs/design-documents/local_code_graph.md` (branch `michaelangeloio/duckdb-local-queries`)
- Old local GKG: https://gitlab.com/gitlab-org/rust/knowledge-graph
- Parent GA epic: https://gitlab.com/groups/gitlab-org/-/epics/19744
issue