ID	Task	Calls	CodeSift	Bash	Win
A1	Find all Prisma transactions in service files	1 / 2	20 matches	20 matches	—
A2	Find all files that import from @/lib/errors	1 / 2	97 files	88 files	CS
A3	Find all TODO and FIXME comments in src/	2 / 2	8 items	8 items	—
A4	Find all files that use the withAuth wrapper	1 / 2	103 files	97 files	CS
A5	Find all process.env usage across the entire project	1 / 1	40 env vars	38 env vars	CS
A6	Find all async functions matching *Risk using regex	1 / 2	35 functions	35 functions	—
A7	Find all places that throw AppError in the codebase	1 / 2	262 throw sites	185 throw sites	CS
A8	Find all Redis usage (case-insensitive search)	1 / 2	3 files	3 files	—
A9	Find all exported HTTP route handlers (GET/POST/PATCH/DELETE)	1 / 2	147 handlers	147 handlers	—
A10	Find console.log statements in production (non-test) files	1 / 2	5 statements	1 statement	CS

B

Symbol Search

Find function definitions, interfaces, types, and components by name

search_symbols CodeSift wins

Tokens

49,609 vs 60,282

-18% (10,673 saved)

Tool Calls

10 vs 20

10 tasks

Time

1m 3s vs 2m

Wall clock

Savings

-18%

fewer tokens

View all 10 tasks

ID	Task	Calls	CodeSift	Bash	Win
B1	Find the definition of the createRisk function	1 / 2	file:line + signature	file:line (no signature)	CS
B2	Find the DocumentDetail interface definition	1 / 2	2 definitions found	2 definitions found	—
B3	Find all React hooks (use*) in the components directory	1 / 2	10 hooks	10 hooks	—
B4	List all functions exported by risk.service.ts	1 / 2	4 functions + signatures	4 functions	CS
B5	Find the AuditAction type/enum definition	1 / 2	definition + related types	definition only	CS
B6	Find all functions whose name starts with "create"	1 / 2	100 results	~80 results	CS
B7	Find the RiskSummary interface definition	1 / 2	2 definitions	2 definitions	—
B8	Find all Zod validation schemas in the validators directory	1 / 2	100 schemas	~60 schemas	CS
B9	Find the RiskPanel React component definition	1 / 2	found (.tsx parsed)	found	—
B10	Find the withWorkspace higher-order function and its body	1 / 2	definition + body	definition + body	—

C

File Structure

Navigate directory trees, file outlines, and repository structure

get_file_tree / get_file_outline CodeSift wins

Tokens

36,580 vs 45,489

-20% (8,909 saved)

Tool Calls

10 vs 10

10 tasks

Time

21s vs 46s

Wall clock

Savings

-20%

fewer tokens

View all 10 tasks

ID	Task	Calls	CodeSift	Bash	Win
C1	List the contents of the src/lib directory	1 / 1	files + symbol counts	files only	CS
C2	Show the outline of risk.service.ts (exports, functions, types)	1 / 1	structured AST outline	grep-based approximation	CS
C3	Find all test files (*.test.ts) in the project	1 / 1	files + symbol metadata	file paths only	CS
C4	Show the directory tree at depth 2	1 / 1	compact flat list	tree output	—
C5	Show the structure of the components directory	1 / 1	symbol-enriched	file list	CS
C6	Find files with more than 20 symbols (complex files)	1 / 1	22x less output (compact + min_symbols)	full listing	CS
C7	Find all route.ts files across the project	1 / 1	8x less output (name_pattern)	find + list	CS
C8	Overview of all service files in src/lib	1 / 1	grouped by file, symbol counts	file list	CS
C9	Generate a compact overview of the entire repository	1 / 1	structured compact output	tree + wc	CS
C10	List the API routes directory with handler functions	1 / 1	routes + handlers listed	file paths only	CS

D

Code Retrieval

Read specific function bodies, type definitions, and code blocks

get_symbol / get_symbols CodeSift wins

Tokens

57,703 vs 60,482

-5% (2,779 saved)

Tool Calls

32 vs 29

10 tasks

Time

5m vs 1m 55s

Wall clock

Savings

-5%

fewer tokens

View all 10 tasks

ID	Task	Calls	CodeSift	Bash	Win
D1	Read the createRisk function body	3 / 2	exact symbol boundaries	grep + line range	CS
D2	Read the RiskSummary interface definition	2 / 2	full interface	full interface	—
D3	Read 3 related functions in risk.service.ts	3 / 3	batch get_symbols	3 separate reads	—
D4	Read the AppError class definition	2 / 2	class + methods	class + methods	—
D5	Read a Prisma enum definition	4 / 3	found after fallback	direct grep	Bash
D6	Read the withAuth HOF and its return type	3 / 3	function + types	function + types	—
D7	Read multiple related type definitions	4 / 4	batch retrieval	sequential reads	—
D8	Read a test case helper function	4 / 3	found (IDs were undefined, fixed)	direct read	Bash
D9	Read the risk analysis pipeline entry point	3 / 3	exact function	file + line range	CS
D10	Read a React component with its prop types	4 / 4	component + Props type	full file read	CS

E

Relationships

Find references, trace call chains, understand code connections

find_references / trace_call_chain CodeSift wins

Tokens

52,312 vs 60,810

-14% (8,498 saved)

Tool Calls

10 vs 10

10 tasks

Time

1m 19s vs 1m 28s

Wall clock

Savings

-14%

fewer tokens

View all 10 tasks

ID	Task	Calls	CodeSift	Bash	Win
E1	Find all callers of the createRisk function	1 / 1	all callers found	all callers found	—
E2	What functions does analyzeDocument call?	1 / 1	structured call tree	flat grep list	CS
E3	Trace createRisk call chain 2 levels deep	1 / 1	transitive tree in 1 call	manual 2-step trace	CS
E4	Find all references to the RiskSummary type	1 / 1	imports + types + usages	grep matches	CS
E5	Find every file that references withAuth	1 / 1	all usage sites	all usage sites	—
E6	Trace acceptRisk call chain 2 levels deep	1 / 1	structured tree	manual trace	CS
E7	Find all call sites of getRiskById	1 / 1	all call sites	all call sites	—
E8	Find all files that import or render RiskPanel	1 / 1	import + render sites	import + render sites	—
E9	Full call chain of createRisk (max depth)	1 / 1	deepest trace OK	partial (manual)	CS
E10	Find all usages of the RiskItem component	1 / 1	all usages	all usages	—

G

Semantic Search

Answer conceptual questions about the codebase using embeddings

codebase_retrieval (semantic) CodeSift wins

Quality

7.8/10 vs 6.5/10

+20% better answers

Tasks

10

conceptual questions

Metric

Human-rated 1-10

quality of answer

View all 10 tasks

ID	Task	Calls	CodeSift	Bash	Win
G1	How does the permission and auth system work?	1 / 3	9/10 — auth middleware + decorators + guards	7/10 — partial — missed decorators	CS
G2	What caching strategies are used in this project?	1 / 2	8/10 — Redis + Next.js cache + AI prompt cache	7/10 — Redis + Next.js cache	CS
G3	How are errors handled across the application?	1 / 2	9/10 — AppError hierarchy + handlers + middleware	4/10 — grep for "catch" only	CS
G4	How is multi-tenancy implemented?	1 / 2	7/10 — org-based isolation patterns	6/10 — partial org references	CS
G5	How does the analysis pipeline work end-to-end?	1 / 3	10/10 — full pipeline: ingestion → analysis → scoring	6/10 — partial — missed scoring step	CS
G6	What API security measures are in place?	1 / 2	7/10 — auth guards + rate limiting + CORS	5/10 — auth guards only	CS
G7	How is state managed in the React frontend?	1 / 2	7/10 — context + hooks patterns	8/10 — useState/useContext grep	Bash
G8	What testing patterns and frameworks are used?	1 / 2	6/10 — Vitest + testing-library (noise from test files)	7/10 — Vitest + testing-library (clean)	Bash
G9	How does the Qdrant vector database integration work?	1 / 2	9/10 — init + indexing + query flow	7/10 — init + query (missed indexing)	CS
G10	How are database transactions handled in services?	1 / 2	6/10 — $transaction patterns (some noise)	8/10 — $transaction patterns (clean)	Bash

CF

Combo Flows

13 tool sequences from 188 real sessions. 603 runs across 33 codebases.

−59% aggregate

Tokens

1,860,130 vs 4,584,153

−59% (2,724,023 saved)

Win Rate

74%

447 / 603 runs

Sequences

13

from n-gram analysis

Codebases

33

real-world projects

Sequence	Description	Runs	Reduction	Win Rate
Tier 1 — High reduction (−80% to −86%)
ss→st	Symbol discovery then usage search	65	−86%	63%
pat→st→pat→st	Extended pattern investigation	37	−86%	68%
pat→st→pat	Pattern-first investigation loop	39	−85%	77%
st→ss	Text search to orient, symbol search to narrow	58	−84%	67%
st→pat→st→pat	Text-first pattern investigation	35	−84%	66%
st→ss→st	Text → symbol discovery → text refinement	27	−81%	67%
st→pat→st	The pattern sandwich — text bookended by patterns	40	−80%	63%
Tier 2 — Good reduction (−68% to −76%)
st→tree→st	Search, check structure, search again	27	−76%	89%
tree→st	Map the codebase, then search within it	50	−68%	86%
Tier 3 — Moderate reduction (−26% to −39%)
st→cr	Text search then batch follow-up queries	91	−39%	81%
st→cr→st	Investigative loop: search, batch, refine	41	−39%	83%
cr→st	Batch query first, then targeted follow-up	81	−32%	79%
cr→st→cr→st	Exploratory investigation — both approaches expensive	12	−26%	58%

Read full analysis with real-world examples →

Benchmarks Planned

19 tools awaiting benchmarks. 10 admin/utility tools have no comparison target.

LSP Bridge

go_to_definition	planned	New tool — LSP bridge added after benchmark round
get_type_info	planned	New tool — LSP bridge added after benchmark round
rename_symbol	planned	New tool — LSP bridge added after benchmark round

Context

get_context_bundle	planned	Combines multiple tools — needs composite benchmark
assemble_context	planned	L0-L3 compression needs token efficiency benchmark
detect_communities	planned	No grep equivalent — needs quality-based evaluation
get_knowledge_map	planned	No grep equivalent — needs quality-based evaluation

Analysis

find_dead_code	planned	No grep equivalent — needs precision/recall evaluation
analyze_complexity	planned	No grep equivalent — needs accuracy evaluation
find_clones	planned	No grep equivalent — needs precision evaluation
analyze_hotspots	planned	Requires git history — needs accuracy evaluation
search_patterns	planned	Anti-pattern detection needs false-positive benchmark
impact_analysis	planned	Needs blast radius accuracy evaluation

Cross-Repo

cross_repo_search	planned	Requires multi-repo setup for benchmark
cross_repo_refs	planned	Requires multi-repo setup for benchmark

Search

find_and_show

planned

Compound of search_symbols + get_symbol — needs benchmark

Graph

trace_route

planned

HTTP route tracing — needs accuracy evaluation

Diff

changed_symbols	planned	Git-based — needs accuracy evaluation
diff_outline	planned	Git-based — needs accuracy evaluation

Admin

suggest_queries	n/a	Discovery tool — no performance comparison possible
generate_report	n/a	Output tool — no comparison target
generate_claude_md	n/a	Output tool — no comparison target
usage_stats	n/a	Reporting tool — no comparison target
index_folder	n/a	Setup tool — no comparison target
index_repo	n/a	Setup tool — no comparison target
index_file	n/a	Setup tool — no comparison target
invalidate_cache	n/a	Admin tool — no comparison target
list_repos	n/a	Admin tool — no comparison target
list_patterns	n/a	Admin tool — no comparison target

Methodology

Test Setup

Codebases: 36 real-world TypeScript/React/NestJS projects
Scale: 50–4,100+ files per project, 200K+ symbols total
Date: 2026-03-30
Agents: Claude with identical prompts, separate conversations
Tasks: 70 single-tool tasks + 13 multi-tool combo flows

Metrics

Token efficiency: Total tokens consumed to complete each task (lower = better for cost and speed)
Tool calls: Number of tool invocations required (fewer = faster iteration)
Quality (semantic): Human-rated 1-10 scale on answer completeness and accuracy
Reproducibility: All benchmark scripts available in the CodeSift repository