The Way You Structure Context Matters as Much as the Context Itself
Why manual context documents fail and how structured conversation replay preserves LLM reasoning.
I noticed something while working with Claude Code.
When I kept a chat open for days — returning to it, asking follow-up questions, building on earlier answers — the responses were sharp. The model felt like it understood the problem deeply.
But when the context was nearing its limit, or when I needed the same background for a separate parallel task — I did what seemed logical: wrote everything to a markdown file and started fresh. The quality dropped. The model felt like it was meeting the problem for the first time.
Same information. Worse results.
I started wondering: is there some secret persistent memory these frontier models maintain across days? Some connection that stays alive, quietly holding onto your session?
The “Structure” Gap
The truth is simpler but more profound: LLMs process structured conversation history differently than they process flat documents. When we flatten a chat into a Markdown file, we strip away the “texture” of the reasoning—the sequence of ruled-out alternatives, the model’s internal thinking, and the specific turn-by-turn evolution of a solution.
To bridge this gap, I built rethread. It is a Go-based CLI tool designed to treat conversation history as a structured signal rather than a static archive.
How rethread Preserves the “Vibe”
Instead of manual summarization, which is inherently lossy, rethread uses architectural principles to maintain the quality of the original session:
- Skeleton + Recent Strategy: Research shows models attend most to the beginning and end of their context window (the “Lost in the Middle” effect).
rethreadautomatically preserves the first two turns to keep the initial problem framing and fills the remaining budget from the most recent turns. - Pruning over Summarization: It never rewrites history. A turn is only dropped if it is a low-signal acknowledgment (like “ok” or “sounds good”) and contains no code blocks, URLs, or file paths.
- Thinking Block Recovery: The tool’s
cleanexport format specifically preserves the model’sthinkingblocks while stripping out massive “tool_result” noise. Seeing the “why” behind previous decisions is often more important for the next model than just seeing the final code.
Using Rethread
# 1. Discover sessions from all sources (auto-detects Claude and Gemini)
# Lists sessions sorted by recency with turn counts and source identification
rethread list
# 2. Analyze a specific session's "health"
# Checks token counts and recommends the best replay/export strategy
rethread inspect d0873f7a
# 3. Export a structured, high-signal JSONL file
# The 'clean' format keeps text, thinking, and tool_use while dropping bloat
rethread export d0873f7a --format clean --output session.jsonl
# 4. Selective export (only the last 20 turns)
# Useful for quickly pivoting to a sub-task without the full history
rethread export d0873f7a --turns 20 --format markdown
Crossing the Streams: Claude ↔ Gemini
The most powerful use case I’ve found is bridging sessions across models. By normalizing history from both Claude Code and Gemini CLI into a unified structure, I can pick up exactly where I left off, regardless of which tool I’m using.
The goal isn’t just to give the model the data; it’s to give it the story of the conversation. If you find yourself losing that “sharpness” when starting new sessions, the answer might not be more context—it might be better structure.