A Ralph Loop for Reading: Beating GPT 5.2 with a 4k Context Window (and 4 GPUs)

Posted 22 hours ago

I recently had a problem that could be solved with money, which is the worst kind of problem.

I am building a new venture called eh-trade.ca. To make it work, I needed deep financial research on 11,000 different stocks.

The "Enterprise" solution is to buy an API subscription. I looked into this. For my usage, the pricing is somewhere between "$200/month" and "Contact Sales". If you are a micro-preneur like me, "Contact Sales" means "You cannot afford this."

The "AI" solution is to ask an LLM to research each stock, which works really well. But 11,000 requests at $0.05 per research session is still $550. Plus, I don't like renting intelligence. I prefer to own it.

So I decided to use the hardware I already had: a home server with four RTX 3090s. It’s a 96GB VRAM beast that heats my basement and scares my birds.

There was just one problem. The models I can run locally (like qwen3 or phi4) have small context windows. Yes, on the model card they theoretically support 40k+ of context, but they would run very slowly on my hardware, so really, it's 4K -- enough for a few screenfuls of text. Moreover, even if I enabled the long contexts, the models struggle to reason over them. If you try to feed them ten search results about a company's balance sheet, the 'needle in the haystack' effect kicks in. They'll get confused and start hallucinating dividends that don't exist.

We need to optimize.

The Naive Solution: The Chatty Agent

Most "AI Agents" are built on a simple loop, often called ReAct (Reason + Act). It looks like this:

User: "Research Apple."
Agent: "I will search for Apple's revenue."
Tool: [Returns 500 words of search snippets and thinking]
Agent: "Okay, now I will search for Apple's debt."
Tool: [Returns another 500 words and thinking]

The problem is the Context Window. By step 3, your prompt looks like a Walmart receipt. By step 5, you have exceeded 8,000 tokens, and the model forgets what it was doing and starts making stuff up.

Sure, it works fine if you are OpenAI and have infinite GPUs. It does not work if you are running on a consumer card in a closet.

The Pivot: Graph Theory to the Rescue

A few months ago, I read a paper called GraphReader. It proposed a different way to think about long contexts. Instead of dumping everything into a chat log, why not treat information as a graph?

The core insight is that you don't need to remember everything. You only need to remember the Atomic Facts.

An Atomic Fact is a single, indivisible truth.

"Apple CEO is Tim Cook." -> Fact.
"Apple revenue is $416B." -> Fact.
"I searched for Apple and found a cool blog." -> Noise.

If we extract these facts and throw away the rest, we can compress megabytes of web pages into a few kilobytes of JSON.

Enter Laconic

I built a library called laconic to implement this. I wrote it in Go, because the rest of my projects are all in Go. Plus, my python environments always end up as a tangled mess of pip install errors.

Laconic doesn't keep a chat history. It keeps a Notebook. The context window size is O(N), where N is the number of facts, not the number of words.

The "Magic" Algorithm

Laconic uses a specific strategy called graph-reader.

Plan: The LLM breaks the question into Key Elements (e.g., "Revenue", "CEO", "Competitors").
Explore: It creates a queue of search queries (Nodes).
Extract: For every search result, it extracts Atomic Facts and adds them to the Notebook.
Refine: If a fact is fuzzy, it adds a new search query to the queue.
Answer: Once the Notebook has enough facts, it stops.

The beautiful part? The LLM never sees the full history. It only sees the current Notebook and the current search result. This means I can run complex, multi-step research tasks on a model with a tiny 4k context window, and it never forgets a thing.

Here is the code to run a research agent on my home server using Ollama:

package main

import (
    "context"
    "fmt"
    "github.com/smhanov/laconic"
    "github.com/smhanov/laconic/search"
)

func main() {
    // 1. Connect to the Beast (Ollama)
    model := laconic.NewOllamaProvider("qwen3:32b", "http://localhost:11434")

    // 2. Build the Agent
    agent := laconic.New(
        laconic.WithPlannerModel(model),
        laconic.WithSynthesizerModel(model),
        // Use DuckDuckGo because it is free
        laconic.WithSearchProvider(search.NewDuckDuckGo()), 
        // Use the Graph Reader strategy
        laconic.WithStrategyName("graph-reader"),
    )

    // 3. Profit
    ans, _ := agent.Answer(context.Background(), "What is the P/E ratio of SHOP.TO?")
    fmt.Println(ans)
}

Is this Ralph?

If you hang out in the parts of the internet where people try to make AI write code without hallucinating, you might have heard of the Ralph Loop.

Popularized by Geoffrey Huntley, the Ralph Loop (often named after Ralph Wiggum) is a brute-force solution to the context problem. You write a bash script that:

Spins up an AI agent.
Gives it one task from a progress.txt file.
Waits for it to finish and commit the code to Git.
Kills the process.

Then it starts over. Fresh context. Zero memory leak. The "memory" is the file system.

Laconic is essentially a Ralph Loop for reading.

Instead of a bash script, it's a Go loop. Instead of git commit, we update a JSON Notebook. But the philosophy is identical: The Context Window is a liability.

Most frameworks try to manage the context window like a precious resource. Ralph and Laconic treat it like a disposable napkin. Use it once, wipe the slate clean, and grab a fresh one.

It turns out that if you treat an LLM like a goldfish with a notepad, it becomes significantly smarter.

Conclusion

By treating context as a scarce resource and using a data structure (a Graph of Facts) instead of a text dump, we can make small, cheap models outperform the giants.

I am currently running this loop on 11,000 tickers that I'm missing basic information on. It will take a week, and cost some power, but I would have left the machine on anyway because I am running a few other things on it.

And if you want to find stocks that are going up, keep an eye on eh-trade.ca. My customers tell me I should brag more, so I'm up 280% in seven months following momentum strategies that it's showing on the main page.

I'll have the data soon, assuming my basement doesn't catch fire.

Steve Hanov makes a living working on Rhymebrain.com, rapt.ink, www.websequencediagrams.com, Zwibbler.com, xreplyextension.com, and eh-trade.ca. He lives in Waterloo, Canada.

Post comment

5 Ways PowToon Made Me Want to Buy Their Software

Even though I saw through their tricks at every step along the way, I am now a customer and proud of it. It is worthwhile to look at what they did, because these are simple things that you can do to improve your software business.

Optimizing Ubuntu to run from a USB key or SD card

Fortunately, by following the tips below, you can make your USB or SD card based linux system fly!

Keeping Abreast of Pornographic Research in Computer Science

Burgeoning numbers of Ph.D's and grad students are choosing to study pornography. Techniques for the analysis of "objectionable images" are gaining increased attention (and grant money) from governments and research institutions around the world, as well as Google. But what, exactly, does computer science have to do with porn? In the name of academic persuit, let's roll up our sleeves and plunge deeply into this often hidden area that lies between the covers of top-shelf research journals.

[comic] Appreciation of xkcd comics vs. technical ability

UMA Questions Answered

A bunch of questions answered about UMA wireless technology.

I didn't know you could mix and match (comic)

Succinct Data Structures: Cramming 80,000 words into a Javascript file.

jQuery creator John Resig needs a little help storing lists of words in his side project. Let's go overkill and explore a little known branch of computer science called Succinct Data Structures.

I built a Chrome extension that lets an LLM “see” tweets

Throw away the keys: Easy, Minimal Perfect Hashing

Perfect hashing is a technique for building a hash table with no collisions in the minimum possible space. They are a easy to build with this simple python function.