fix(parsers/javascript): extract Express anonymous route handler callbacks (#21)#48
fix(parsers/javascript): extract Express anonymous route handler callbacks (#21)#48ChrisJr404 wants to merge 1 commit into
Conversation
…backs (knostic#21) The analyzer never walked into call-expression argument lists, so when a route was defined idiomatically as router.post('/orders', authenticateToken, async (req, res) => { ... }); the only thing it picked up was the named middleware reference. The actual handler body — where most of the application's business logic and vulnerabilities live — was invisible to every downstream stage, causing reachability_filter to drop almost everything as "unreachable from input." Adds an _extractRouteHandlerCallbacks pass that walks every CallExpression in the file, detects the Express verb shape (`<receiver>.{get,post,put,patch,delete,options,head,all,use}(...)`), and extracts each ArrowFunction / FunctionExpression argument as a unit with: - synthetic name `<METHOD> <path>` (e.g. `POST /orders`) when the path is a string literal — matches the "method and path as metadata" expectation in the issue. - `unitType: "route_handler"` so the existing classifier downstream doesn't have to re-derive it from the body. - `isEntryPoint: true` so reachability_filter treats it as a request-data entry the way it already treats named (req, res) middleware. - `httpMethod` / `httpPath` properties carried through for any follow-up steps that want to render route info. Multi-callback registrations (chained middleware + final handler) get suffixed with their post-path arg index so they don't collide. `SyntaxKind.CallExpression` is resolved dynamically off the typescript dep at call time — its numeric value drifts between typescript releases (213 in older versions, 214 in 5.x). Smoke-tested against the issue's example file plus three additional shapes (`router.get(path, handler)`, `app.use(path, mw)`, `router.delete(path, mw1, mw2, handler)`); all four units extracted with correct method / path / index. The named middleware references (`authenticateToken`) correctly stay out of the unit list since they parse as Identifiers, not ArrowFunctions. Hapi / Koa / Fastify use a different shape (object literal with a `handler` property) and are out of scope for this fix.
|
Hey @ChrisJr404 — thanks for jumping on Issue #21 and for the detailed approach + smoke test. Your diagnosis was spot on: the analyzer never walked into Express callback argument lists, and We ended up going with PR #49 (@joshbouncesecurity) for this one and have already merged it as part of release/2026-05-12. The two PRs identified the same root cause; #49's implementation just covered a few more edges that mattered for the broader pipeline:
None of that means your PR was wrong — it just landed at a different point on the precision/coverage tradeoff. Closing this one as superseded by #49. A few of your other PRs are next on our review queue (#52 OpenRouter, #53 |
Closes #21.
The analyzer never walked into call-expression argument lists, so when a route was defined idiomatically as
```js
router.post('/orders', authenticateToken, async (req, res) => { ... });
```
the only thing it picked up was the named middleware reference. The actual handler body — where most of the application's business logic and vulnerabilities live — was invisible to every downstream stage, which is why `reachability_filter` was dropping ~75% of the units in the example codebase.
Approach
New `_extractRouteHandlerCallbacks` pass on every source file that walks every `CallExpression`, detects the Express verb shape (`.{get,post,put,patch,delete,options,head,all,use}(...)`), and extracts each `ArrowFunction` / `FunctionExpression` argument as its own unit with:
Multi-callback registrations (chained middleware + final handler) get suffixed with their post-path arg index — `POST /orders` for the first callback, `POST /orders [1]` for the second, etc — so they don't collide.
`SyntaxKind.CallExpression` is resolved off the typescript dep at call time (its numeric value drifts between releases — 213 in older versions, 214 in 5.x) rather than hard-coded.
Smoke test
Ran the analyzer against the issue's example file plus three additional shapes:
```
$ node -e "const { TypeScriptAnalyzer } = require('./typescript_analyzer.js');
const a = new TypeScriptAnalyzer('/tmp/express_test_repo');
const result = a.analyzeFiles(['routes.js']);
for (const [id, f] of Object.entries(result.functions)) {
console.log(id, '| type=' + f.unitType, '| entry=' + (f.isEntryPoint||false),
'| method=' + (f.httpMethod||''), '| path=' + (f.httpPath||''));
}
"
routes.js:POST /orders [1] | type=route_handler | entry=true | method=POST | path=/orders
routes.js:GET /users/:id | type=route_handler | entry=true | method=GET | path=/users/:id
routes.js:USE /api | type=route_handler | entry=true | method=USE | path=/api
routes.js:DELETE /items/:id [2] | type=route_handler | entry=true | method=DELETE | path=/items/:id
```
Named middleware references (`authenticateToken`, `authorizeAdmin`) correctly stay out of the unit list since they parse as `Identifier`, not `ArrowFunction` / `FunctionExpression`. The existing variable-decl and module.exports passes still pick those up where they're defined.
What's not in scope
Test plan