I recently had a problem that could be solved with money, which is the worst kind of problem.
I am building a new venture called eh-trade.ca. To make it work, I needed deep financial research on 11,000 different stocks.
The "Enterprise" solution is to buy an API subscription. I looked into this. For my usage, the pricing is somewhere between "$200/month" and "Contact Sales". If you are a micro-preneur like me, "Contact Sales" means "You cannot afford this."
The "AI" solution is to ask an LLM to research each stock, which works really well. But 11,000 requests at $0.05 per research session is still $550. Plus, I don't like renting intelligence. I prefer to own it.
So I decided to use the hardware I already had: a home server with four RTX 3090s. It’s a 96GB VRAM beast that heats my basement and scares my birds.
There was just one problem. The models I can run locally (like qwen3 or phi4) have small context windows. Yes, on the model card they theoretically support 40k+ of context, but they would run very slowly on my hardware, so really, it's 4K -- enough for a few screenfuls of text. Moreover, even if I enabled the long contexts, the models struggle to reason over them. If you try to feed them ten search results about a company's balance sheet, the 'needle in the haystack' effect kicks in. They'll get confused and start hallucinating dividends that don't exist.
We need to optimize.
The Naive Solution: The Chatty Agent
Most "AI Agents" are built on a simple loop, often called ReAct (Reason + Act). It looks like this:
- User: "Research Apple."
- Agent: "I will search for Apple's revenue."
- Tool: [Returns 500 words of search snippets and thinking]
- Agent: "Okay, now I will search for Apple's debt."
- Tool: [Returns another 500 words and thinking]
The problem is the Context Window. By step 3, your prompt looks like a Walmart receipt. By step 5, you have exceeded 8,000 tokens, and the model forgets what it was doing and starts making stuff up.
Sure, it works fine if you are OpenAI and have infinite GPUs. It does not work if you are running on a consumer card in a closet.
The Pivot: Graph Theory to the Rescue
A few months ago, I read a paper called GraphReader. It proposed a different way to think about long contexts. Instead of dumping everything into a chat log, why not treat information as a graph?
The core insight is that you don't need to remember everything. You only need to remember the Atomic Facts.
An Atomic Fact is a single, indivisible truth.
- "Apple CEO is Tim Cook." -> Fact.
- "Apple revenue is $416B." -> Fact.
- "I searched for Apple and found a cool blog." -> Noise.
If we extract these facts and throw away the rest, we can compress megabytes of web pages into a few kilobytes of JSON.
Enter Laconic
I built a library called laconic to implement this. I wrote it in Go, because the rest of my projects are all in Go. Plus, my python environments always end up as a tangled mess of pip install errors.
Laconic doesn't keep a chat history. It keeps a Notebook. The context window size is O(N), where N is the number of facts, not the number of words.
The "Magic" Algorithm
Laconic uses a specific strategy called graph-reader.
- Plan: The LLM breaks the question into Key Elements (e.g., "Revenue", "CEO", "Competitors").
- Explore: It creates a queue of search queries (Nodes).
- Extract: For every search result, it extracts Atomic Facts and adds them to the Notebook.
- Refine: If a fact is fuzzy, it adds a new search query to the queue.
- Answer: Once the Notebook has enough facts, it stops.
The beautiful part? The LLM never sees the full history. It only sees the current Notebook and the current search result. This means I can run complex, multi-step research tasks on a model with a tiny 4k context window, and it never forgets a thing.
Here is the code to run a research agent on my home server using Ollama:
package main
import (
"context"
"fmt"
"github.com/smhanov/laconic"
"github.com/smhanov/laconic/search"
)
func main() {
// 1. Connect to the Beast (Ollama)
model := laconic.NewOllamaProvider("qwen3:32b", "http://localhost:11434")
// 2. Build the Agent
agent := laconic.New(
laconic.WithPlannerModel(model),
laconic.WithSynthesizerModel(model),
// Use DuckDuckGo because it is free
laconic.WithSearchProvider(search.NewDuckDuckGo()),
// Use the Graph Reader strategy
laconic.WithStrategyName("graph-reader"),
)
// 3. Profit
ans, _ := agent.Answer(context.Background(), "What is the P/E ratio of SHOP.TO?")
fmt.Println(ans)
}
Is this Ralph?
If you hang out in the parts of the internet where people try to make AI write code without hallucinating, you might have heard of the Ralph Loop.
Popularized by Geoffrey Huntley, the Ralph Loop (often named after Ralph Wiggum) is a brute-force solution to the context problem. You write a bash script that:
- Spins up an AI agent.
- Gives it one task from a progress.txt file.
- Waits for it to finish and commit the code to Git.
- Kills the process.
Then it starts over. Fresh context. Zero memory leak. The "memory" is the file system.
Laconic is essentially a Ralph Loop for reading.
Instead of a bash script, it's a Go loop. Instead of git commit, we update a JSON Notebook. But the philosophy is identical: The Context Window is a liability.
Most frameworks try to manage the context window like a precious resource. Ralph and Laconic treat it like a disposable napkin. Use it once, wipe the slate clean, and grab a fresh one.
It turns out that if you treat an LLM like a goldfish with a notepad, it becomes significantly smarter.
Conclusion
By treating context as a scarce resource and using a data structure (a Graph of Facts) instead of a text dump, we can make small, cheap models outperform the giants.
I am currently running this loop on 11,000 tickers that I'm missing basic information on. It will take a week, and cost some power, but I would have left the machine on anyway because I am running a few other things on it.
And if you want to find stocks that are going up, keep an eye on eh-trade.ca. My customers tell me I should brag more, so I'm up 280% in seven months following momentum strategies that it's showing on the main page.
I'll have the data soon, assuming my basement doesn't catch fire.