Skip to content

Claude Code Session Log Analysis

Analysis of session logs from ~/.claude/projects/-home-me-git-moss/.

Session Overview

Main session: 6cb78e3b-873f-4a1f-aebe-c98d701052e3.jsonl (59MB) Subagents: 15+ agent files (200KB-500KB each)

Message Types

TypeCount
assistant8512
user3918
file-history-snapshot319
queue-operation110
system41
summary1

Tool Usage

ToolCallsErrorsSuccess Rate
Bash1,37223483%
Edit990799%
Read644499%
TodoWrite2860100%
Write253996%
Grep2210100%
Glob400100%
Task120100%
Other35391%

Total: 3,853 tool calls, 257 errors (93% success rate)

Note: "File not read yet" errors are attributed to Write (the tool that checks), not Edit.

Bash Command Patterns

Most common command prefixes:

  • uv (550): Running tests, python scripts, package management
  • ruff (331): Linting and formatting
  • git (314): Version control
  • grep (41): Text search (despite Grep tool!)
  • nix (12): Development environment

Error Analysis

Bash Errors (234 total)

Exit codes:

  • Exit code 1: 222 (general failure)
  • Exit code 2: 7 (misuse)
  • Exit code 124: 4 (timeout)
  • Exit code 127: 1 (command not found)

Common error categories:

  1. Test failures (pytest): Test assertions failing during development
  2. Lint errors (ruff): Code style violations caught during check
  3. Build errors: Missing dependencies, import errors
  4. LSP server tests: Specific test file with persistent failures

File Operation Errors (20 total)

ErrorCountCause
File not read yet9Write/Edit before Read
File does not exist4Read/Write to missing file
Found N matches3Edit string not unique
String not found2Edit target changed
File modified since read2Linter changed file

Other Errors (3 total)

  • 2: User plan refinements (interactive editing, not rejections)
  • 1: KillShell on already-completed shell

User Rejections

The 2 "user doesn't want to proceed" errors were actually plan refinements - the user was editing/improving proposed plans via Claude Code's interactive mode, not rejecting work outright.

Observations

Patterns

  1. High Bash usage for testing: Running uv run pytest frequently, which is expected during development
  2. Linting integrated into workflow: ruff check and ruff format called 331 times
  3. Git operations frequent: 314 git commands for commits, status checks
  4. TodoWrite heavily used: 285 calls, showing good task tracking discipline

Potential Inefficiencies

  1. Using grep in Bash (41 calls) when Grep tool exists and is optimized
  2. File read-before-write errors (9 cases): Could be addressed with better tool sequencing
  3. Non-unique Edit strings (3 cases): Need more context in edit operations
  4. Low parallel tool usage: 99.95% of turns have only 1 content block - could parallelize more

Turn Patterns

Content BlocksCount
18,562
22
32

Almost all turns are sequential (1 tool call at a time). There's significant opportunity to parallelize independent operations like:

  • Reading multiple files at once
  • Running git status + git diff in parallel
  • Multiple Grep searches

What's Working Well

  1. Edit success rate 99%: Very reliable file editing
  2. Read success rate 99%: File reading rarely fails
  3. TodoWrite 100% success: Task tracking working flawlessly
  4. Subagent usage: Only 12 Task calls, showing restraint in spawning subagents
  5. Subagents are focused: Mostly use Read/Grep/Glob for exploration, minimal Bash

Subagent Patterns

15 subagents spawned, tool usage shows they're used for research:

  • Read: Primary tool (11-27 calls per agent)
  • Grep: Secondary (1-7 calls)
  • Glob: File finding (1-4 calls)
  • Bash: Minimal (0-10 calls, usually for running code)

Subagents don't edit files - they gather information for the main session.

Recommendations

For Claude Code Users

  1. Prefer Grep tool over bash grep: More integrated, better error handling
  2. Always Read before Write/Edit: Avoid the 9 "not read yet" errors
  3. Provide more context in Edit: Avoid non-unique string matches
  4. Parallelize independent operations: Read multiple files, run multiple commands in one turn

For This Project

  1. LSP tests need attention: Multiple failures in test_lsp_server.py
  2. Consider log analysis tooling: This manual analysis could be automated as a moss command

Future Work

Could build a moss analyze-logs command that:

  • Parses Claude Code session logs
  • Computes tool success rates
  • Identifies error patterns
  • Suggests workflow improvements
  • Estimates token costs

This would help users understand their agent interaction patterns.

Token Usage

Note: Log entries are duplicated per streaming chunk. Numbers below use unique requestId grouping.

MetricCount
Unique API Calls3,716
New Input Tokens33K
Cache Creation7.4M
Cache Read362M
Output Tokens1.45M
Total Context Processed372M

Context Size Distribution

MetricTokens
Minimum context19K
Maximum context156K
Average context100K

Context grew from ~19K to ~156K tokens as the session progressed over 31 hours.

Cost Estimate (Opus 4.5 API rates)

Opus 4.5: $5/M input, $25/M output. Cache: 90% discount read, 25% premium write.

ComponentTokensRateCost
Cache read362M$0.50/M$181
Cache create7.4M$6.25/M$46
New input33K$5/M$0.17
Output1.45M$25/M$36
Total~$263

Note: Claude Code Pro subscription ($100-200/mo) provides unlimited usage, so actual user cost depends on subscription tier, not raw API rates.

Cost Comparison: Alternative Models & Paradigms

Baseline numbers from this session:

  • 3,716 API calls
  • ~370M input tokens processed (97% from cache)
  • 1.45M output tokens
  • Average context: 100K tokens/call

a) Gemini Flash ($0.50/M input, $3/M output, 90% auto-cache)

ComponentTokensRateCost
Cached input359M$0.05/M$18
Non-cached input11M$0.50/M$6
Output1.45M$3/M$4
Total~$28

9x cheaper than Opus ($263 → $28) - output pricing dominates ($25/M → $3/M).

b) Moss Paradigm (conservative 10x context reduction)

Target: 10K tokens/call instead of 100K (conservative; vision is 1-5K).

MetricCurrentMoss
Context/call100K10K
Total input370M37M
Output (est. 30% reduction)1.45M1.0M

Moss + Opus 4.5:

ComponentTokensRateCost
Cached input35.9M$0.50/M$18
Cache create740K$6.25/M$5
New input360K$5/M$2
Output1.0M$25/M$25
Total~$50

5x cheaper than current ($263 → $50).

Moss + Gemini Flash:

ComponentTokensRateCost
Cached input33.3M$0.05/M$2
Non-cached input3.7M$0.50/M$2
Output1.0M$3/M$3
Total~$7

38x cheaper than current Opus ($263 → $7).

Summary

ConfigurationCostvs Current
Opus 4.5 (current)$263baseline
Gemini Flash (same paradigm)$289x cheaper
Opus 4.5 + Moss$505x cheaper
Gemini Flash + Moss$738x cheaper

The moss paradigm's value compounds with cheaper models. Reduced context means:

  • Less to cache/transmit
  • More focused model attention (better quality)
  • Faster response times

Data Deduplication Note

The raw log has 8,500+ assistant entries because each streaming chunk is logged separately. Grouping by requestId and taking max values gives the correct per-call totals.

Architectural Insights

The Real Problem: Context Accumulation

The chatlog-as-context paradigm is fundamentally flawed:

  • Context grew 19K → 156K tokens over 31 hours
  • 99% of tokens are re-sent unchanged every turn
  • "Lost in the middle" - models degrade at high context
  • Signal drowns in noise

Why All Agentic Tools Do This

Every tool (Claude Code, Cursor, Copilot, etc.) uses similar approaches. Not because it's optimal, but:

  1. LLM APIs are stateless - must send context each call
  2. Summarization risks losing critical details
  3. RAG retrieval might miss something
  4. "Good enough" for short sessions
  5. Engineering complexity / risk aversion

The Better Approach (Moss Vision)

For code: Merkle tree of structural views

  • Hash AST/modules/deps
  • Know what changed instantly
  • Invalidate cached summaries surgically

For chat: Don't keep history

  • Task state: "implementing feature X"
  • Retrieved memories: RAG for relevant past learnings
  • Recent turns: last 2-3 exchanges only
  • Compiled context: skeleton/deps, not raw files

Target: 1-10K tokens instead of 100K+

Why This Should Be More Reliable

Curated context isn't just cheaper - it's better:

  • Model focuses on relevant signal
  • No "lost in the middle" degradation
  • Less noise = fewer hallucinations
  • Faster iteration cycles

Analysis performed on session from Dec 17-18, 2025