Robotics Reinforcement Learning

Robots that learn to move, grasp, and decide.

TelosRL builds reinforcement-learning systems for legged robots, manipulators, drones, and humanoids — from high-fidelity simulation through validated hardware deployment.

By Howard Cho · MS Robotics & Controls, University of Washington
PPO policy · v34
Isaac Lab · 2048 envs
Reinforcement LearningIsaac LabPPOSim-to-RealQuadrupedsHumanoidsPyTorchROS 2TensorRTCUDAGRPOVLMEdge Inference
0m/s
Peak release velocity
0
Parallel sim envs
0
Robot morphologies
v0+
Policy generations
§ 01
Platforms

Learning across every morphology

One reinforcement-learning pipeline, retargeted across ground, aerial, and bipedal systems — ground truth in simulation, validated on hardware.

PF / 01

Quadrupeds

Dynamic locomotion, arm manipulation, and high-velocity throwing with sim-to-real transfer.

PF / 02

Aerial systems

Autonomous navigation, obstacle avoidance, and precision landing for multi-rotor platforms.

PF / 03

Humanoids

Whole-body control, bipedal locomotion, and dexterous manipulation through learned policies.

PF / 04

Manipulators

Vision-language-guided grasp targeting, constrained manipulation, and contact-rich extraction.

§ 02
Live telemetry

Training in progress

Representative output from a quadruped throwing policy — PPO on 2,048 parallel environments.

~/telos_rl/train_quadruped_throw.py
Reward · mean episode
0
Curriculum stage 3 of 4
Throughput
0k steps/s
GPU 0 util: 92%
Policy iteration
0/ 1024
ETA 02:14:33
§ 03
Research log

Active projects & results

End-to-end systems from reward design to deployed inference. Figures report best validated metrics to date.

RefProject & methodResult
R-01
Dynamic arm throwing
PPO policy for high-velocity throwing — kinematic-chain whip, phase clamping, 3→10 m curriculum, velocity-triggered release, FK collision avoidance.
Isaac LabRSL-RLPPOSim2Real
12 m/s
Release speed
R-02
Agent autonomy stack
Voice-commanded robot control with vision-language models — person tracking, ASR, GPS fusion, and a tactical bridge on edge compute.
ROS 2VLMYOLOTensorRT
8 B
Parameters
R-03
Persistent object tracking
Real-time multi-object tracking with a four-state machine, sparse optical flow, and fisheye distortion correction.
BoT-SORTOpenCVTensorRT
60 fps
Tracking rate
R-04
Edge LLM inference
Multi-machine local model serving with quantized GGUF models and speculative decoding for robot command interpretation.
llama.cppGGUFCUDA
125 GB
Unified memory
R-05
GRPO policy optimization
Group Relative Policy Optimization with parameter-efficient fine-tuning, converging in under fifteen minutes on edge hardware.
GRPOPEFTTRL
14 min
Train time
§ 04
Open research

Portfolio & code

Foundational and ongoing work, published openly. The full portfolio site collects write-ups, notebooks, and demos.

Full portfolio & project write-upsbmaxdk.github.io

Visit GitHub Pages
§ 05
Apps

Software we ship

Native tools built alongside the research. First up: TermX — SSH terminal, files & tunnels, with the whole fleet in your pocket.

TermX SSH

iOS · iPadOS

A fast, native SSH client for iPhone and iPad. Run a real terminal, move files over SFTP, forward ports, and watch live video — all over one encrypted connection. Credentials and keys stay in the Keychain behind Face ID, and the app collects no data.

Real xterm terminal Multiple sessions SFTP files Port forwarding Video · MJPEG/RTSP/HLS ed25519 keys Jump hosts iPad split view Keychain · Face ID
home-pc · ssh jump@telosrl ~ $ ssh home-pc [ok] tunnel via :2222 Welcome to Ubuntu 22.04 LTS r0b0t4rl@home $ nvidia-smi --query GPU 0 RTX 5000 61°C 92% mem 14.2/16 GB r0b0t4rl@home $ tail -f train.log iter 414 reward 859.4 iter 415 reward 863.1 r0b0t4rl@home $ esctabctrl ▲ ▼
TelosRL apps
TermX SSH
Available · iOS & iPadOS
In development
Coming soon
More to come
Robotics & RL tools
§ 06
Capabilities

A vertically integrated stack

From policy design through deployed inference — every layer owned and validated.

/01

Policy training

PPO, SAC, and GRPO with GPU-parallel environments. Curriculum learning, domain randomization, and reward shaping built in.

/02

Sim-to-real transfer

Domain randomization and system ID validated on hardware.

/03

Computer vision

Detection, tracking, and pose estimation for edge deploy.

/04

Language models

On-device LLM inference for command parsing.

/05

Systems architecture

ROS 2, sensor fusion, behavior trees, distributed compute.

/06

Performance engineering

CUDA optimization, quantization, real-time tuning.

§ 07
Technology

Tools that ship the work

RL & Simulation / 01

  • Isaac Lab / Sim
  • RSL-RL / PPO
  • PhysX GPU
  • MuJoCo
  • Gymnasium

Robotics / 02

  • ROS 2 Humble
  • Robot SDKs
  • URDF / MJCF
  • Nav2 / SLAM
  • Behavior Trees

AI / ML / 03

  • PyTorch
  • YOLO / TensorRT
  • llama.cpp
  • HuggingFace / PEFT
  • ASR / TTS

Infrastructure / 04

  • CUDA 12 / 13
  • Docker
  • nginx / Tailscale
  • systemd
  • Ubuntu
§ 08
Hardware

Compute fleet

Heterogeneous GPU infrastructure spanning simulation, inference, and edge deployment.

UNIT / 01

Training workstation

Desktop GPU, 16 GB VRAM, 128 GB system. Primary simulation training.

CUDA 12.8 · x86_64
UNIT / 02

Inference server

125 GB unified memory for large-model inference and fine-tuning.

CUDA 13.0 · ARM64
UNIT / 03

Edge compute

64 GB unified, ROS 2. Primary robot deployment platform.

ARM64 · Docker
UNIT / 04

Research station

Desktop AI supercomputer with 3.67 TB storage.

NVIDIA DGX
§ 09 — Contact

Let's build robots that learn.

Open to research collaboration, consulting, and hardware deployments in robotics and reinforcement learning.

Get in touch Portfolio
GitHub LinkedIn Email Portal