Claude Code Session Log Analysis

Analysis of session logs from ~/.claude/projects/-home-me-git-moss/.

Session Overview

Main session: 6cb78e3b-873f-4a1f-aebe-c98d701052e3.jsonl (59MB) Subagents: 15+ agent files (200KB-500KB each)

Message Types

Type	Count
assistant	8512
user	3918
file-history-snapshot	319
queue-operation	110
system	41
summary	1

Tool Usage

Tool	Calls	Errors	Success Rate
Bash	1,372	234	83%
Edit	990	7	99%
Read	644	4	99%
TodoWrite	286	0	100%
Write	253	9	96%
Grep	221	0	100%
Glob	40	0	100%
Task	12	0	100%
Other	35	3	91%

Total: 3,853 tool calls, 257 errors (93% success rate)

Note: "File not read yet" errors are attributed to Write (the tool that checks), not Edit.

Bash Command Patterns

Most common command prefixes:

uv (550): Running tests, python scripts, package management
ruff (331): Linting and formatting
git (314): Version control
grep (41): Text search (despite Grep tool!)
nix (12): Development environment

Error Analysis

Bash Errors (234 total)

Exit codes:

Exit code 1: 222 (general failure)
Exit code 2: 7 (misuse)
Exit code 124: 4 (timeout)
Exit code 127: 1 (command not found)

Common error categories:

Test failures (pytest): Test assertions failing during development
Lint errors (ruff): Code style violations caught during check
Build errors: Missing dependencies, import errors
LSP server tests: Specific test file with persistent failures

File Operation Errors (20 total)

Error	Count	Cause
File not read yet	9	Write/Edit before Read
File does not exist	4	Read/Write to missing file
Found N matches	3	Edit string not unique
String not found	2	Edit target changed
File modified since read	2	Linter changed file

Other Errors (3 total)

2: User plan refinements (interactive editing, not rejections)
1: KillShell on already-completed shell

User Rejections

The 2 "user doesn't want to proceed" errors were actually plan refinements - the user was editing/improving proposed plans via Claude Code's interactive mode, not rejecting work outright.

Observations

Patterns

High Bash usage for testing: Running uv run pytest frequently, which is expected during development
Linting integrated into workflow: ruff check and ruff format called 331 times
Git operations frequent: 314 git commands for commits, status checks
TodoWrite heavily used: 285 calls, showing good task tracking discipline

Potential Inefficiencies

Using grep in Bash (41 calls) when Grep tool exists and is optimized
File read-before-write errors (9 cases): Could be addressed with better tool sequencing
Non-unique Edit strings (3 cases): Need more context in edit operations
Low parallel tool usage: 99.95% of turns have only 1 content block - could parallelize more

Turn Patterns

Content Blocks	Count
1	8,562
2	2
3	2

Almost all turns are sequential (1 tool call at a time). There's significant opportunity to parallelize independent operations like:

Reading multiple files at once
Running git status + git diff in parallel
Multiple Grep searches

What's Working Well

Edit success rate 99%: Very reliable file editing
Read success rate 99%: File reading rarely fails
TodoWrite 100% success: Task tracking working flawlessly
Subagent usage: Only 12 Task calls, showing restraint in spawning subagents
Subagents are focused: Mostly use Read/Grep/Glob for exploration, minimal Bash

Subagent Patterns

15 subagents spawned, tool usage shows they're used for research:

Read: Primary tool (11-27 calls per agent)
Grep: Secondary (1-7 calls)
Glob: File finding (1-4 calls)
Bash: Minimal (0-10 calls, usually for running code)

Subagents don't edit files - they gather information for the main session.

Recommendations

For Claude Code Users

Prefer Grep tool over bash grep: More integrated, better error handling
Always Read before Write/Edit: Avoid the 9 "not read yet" errors
Provide more context in Edit: Avoid non-unique string matches
Parallelize independent operations: Read multiple files, run multiple commands in one turn

For This Project

LSP tests need attention: Multiple failures in test_lsp_server.py
Consider log analysis tooling: This manual analysis could be automated as a moss command

Future Work

Could build a moss analyze-logs command that:

Parses Claude Code session logs
Computes tool success rates
Identifies error patterns
Suggests workflow improvements
Estimates token costs

This would help users understand their agent interaction patterns.

Token Usage

Note: Log entries are duplicated per streaming chunk. Numbers below use unique requestId grouping.

Metric	Count
Unique API Calls	3,716
New Input Tokens	33K
Cache Creation	7.4M
Cache Read	362M
Output Tokens	1.45M
Total Context Processed	372M

Context Size Distribution

Metric	Tokens
Minimum context	19K
Maximum context	156K
Average context	100K

Context grew from ~19K to ~156K tokens as the session progressed over 31 hours.

Cost Estimate (Opus 4.5 API rates)

Opus 4.5: $5/M input, $25/M output. Cache: 90% discount read, 25% premium write.

Component	Tokens	Rate	Cost
Cache read	362M	$0.50/M	$181
Cache create	7.4M	$6.25/M	$46
New input	33K	$5/M	$0.17
Output	1.45M	$25/M	$36
Total			~$263

Note: Claude Code Pro subscription ($100-200/mo) provides unlimited usage, so actual user cost depends on subscription tier, not raw API rates.

Cost Comparison: Alternative Models & Paradigms

Baseline numbers from this session:

3,716 API calls
~370M input tokens processed (97% from cache)
1.45M output tokens
Average context: 100K tokens/call

a) Gemini Flash ($0.50/M input, $3/M output, 90% auto-cache)

Component	Tokens	Rate	Cost
Cached input	359M	$0.05/M	$18
Non-cached input	11M	$0.50/M	$6
Output	1.45M	$3/M	$4
Total			~$28

9x cheaper than Opus ($263 → $28) - output pricing dominates ($25/M → $3/M).

b) Moss Paradigm (conservative 10x context reduction)

Target: 10K tokens/call instead of 100K (conservative; vision is 1-5K).

Metric	Current	Moss
Context/call	100K	10K
Total input	370M	37M
Output (est. 30% reduction)	1.45M	1.0M

Moss + Opus 4.5:

Component	Tokens	Rate	Cost
Cached input	35.9M	$0.50/M	$18
Cache create	740K	$6.25/M	$5
New input	360K	$5/M	$2
Output	1.0M	$25/M	$25
Total			~$50

5x cheaper than current ($263 → $50).

Moss + Gemini Flash:

Component	Tokens	Rate	Cost
Cached input	33.3M	$0.05/M	$2
Non-cached input	3.7M	$0.50/M	$2
Output	1.0M	$3/M	$3
Total			~$7

38x cheaper than current Opus ($263 → $7).

Summary

Configuration	Cost	vs Current
Opus 4.5 (current)	$263	baseline
Gemini Flash (same paradigm)	$28	9x cheaper
Opus 4.5 + Moss	$50	5x cheaper
Gemini Flash + Moss	$7	38x cheaper

The moss paradigm's value compounds with cheaper models. Reduced context means:

Less to cache/transmit
More focused model attention (better quality)
Faster response times

Data Deduplication Note

The raw log has 8,500+ assistant entries because each streaming chunk is logged separately. Grouping by requestId and taking max values gives the correct per-call totals.

Architectural Insights

The Real Problem: Context Accumulation

The chatlog-as-context paradigm is fundamentally flawed:

Context grew 19K → 156K tokens over 31 hours
99% of tokens are re-sent unchanged every turn
"Lost in the middle" - models degrade at high context
Signal drowns in noise

Why All Agentic Tools Do This

Every tool (Claude Code, Cursor, Copilot, etc.) uses similar approaches. Not because it's optimal, but:

LLM APIs are stateless - must send context each call
Summarization risks losing critical details
RAG retrieval might miss something
"Good enough" for short sessions
Engineering complexity / risk aversion

The Better Approach (Moss Vision)

For code: Merkle tree of structural views

Hash AST/modules/deps
Know what changed instantly
Invalidate cached summaries surgically

For chat: Don't keep history

Task state: "implementing feature X"
Retrieved memories: RAG for relevant past learnings
Recent turns: last 2-3 exchanges only
Compiled context: skeleton/deps, not raw files

Target: 1-10K tokens instead of 100K+

Why This Should Be More Reliable

Curated context isn't just cheaper - it's better:

Model focuses on relevant signal
No "lost in the middle" degradation
Less noise = fewer hallucinations
Faster iteration cycles

Analysis performed on session from Dec 17-18, 2025

Claude Code Session Log Analysis ​

Session Overview ​

Message Types ​

Tool Usage ​

Bash Command Patterns ​

Error Analysis ​

Bash Errors (234 total) ​

File Operation Errors (20 total) ​

Other Errors (3 total) ​

User Rejections ​

Observations ​

Patterns ​

Potential Inefficiencies ​

Turn Patterns ​

What's Working Well ​

Subagent Patterns ​

Recommendations ​

For Claude Code Users ​

For This Project ​

Future Work ​

Token Usage ​

Context Size Distribution ​

Cost Estimate (Opus 4.5 API rates) ​

Cost Comparison: Alternative Models & Paradigms ​

a) Gemini Flash ($0.50/M input, $3/M output, 90% auto-cache) ​

b) Moss Paradigm (conservative 10x context reduction) ​

Summary ​

Data Deduplication Note ​

Architectural Insights ​

The Real Problem: Context Accumulation ​

Why All Agentic Tools Do This ​

The Better Approach (Moss Vision) ​

Why This Should Be More Reliable ​