On reinforcement learning.
How does observation representation affect sample efficiency in long-horizon RL?
Pixel vs. symbolic vs. hybrid observations under capacity-matched encoders.
researcher. scholar. engineer.
notes from the bench.
About
I work at the intersection of cloud infrastructure, agentic AI systems, reinforcement learning, and multi-agent trust. Fifteen years building production systems. Now turning that engineering lens onto the research bench.
The current empirical work: in long-horizon reinforcement learning, do symbolic observations beat pixel observations once encoder capacity is properly controlled? Pokémon Red is the environment, RecurrentPPO via SB3-Contrib the method.
Research
the current questions
On reinforcement learning.
How does observation representation affect sample efficiency in long-horizon RL?
Pixel vs. symbolic vs. hybrid observations under capacity-matched encoders.
On team architecture.
How does the unit of work change when teams include autonomous agents?
What structures support variable-elasticity teams.
On multi-agent security.
How does trust topology affect security in multi-agent LLM systems?
Centralized orchestrator vs. peer-authenticated trust.
Writing
All writing →Projects
All projects →Reinforcement learning research on observation representations in Pokémon Red. Pixel, symbolic, and hybrid conditions under capacity-matched encoders. RecurrentPPO via SB3-Contrib.
githubKubernetes operator that runs Claude Code agent teams as distributed pods.
githubAI personal executive assistant built around Obsidian.
github