TelosRL builds reinforcement-learning systems for legged robots, manipulators, drones, and humanoids — from high-fidelity simulation through validated hardware deployment.
One reinforcement-learning pipeline, retargeted across ground, aerial, and bipedal systems — ground truth in simulation, validated on hardware.
Dynamic locomotion, arm manipulation, and high-velocity throwing with sim-to-real transfer.
Autonomous navigation, obstacle avoidance, and precision landing for multi-rotor platforms.
Whole-body control, bipedal locomotion, and dexterous manipulation through learned policies.
Vision-language-guided grasp targeting, constrained manipulation, and contact-rich extraction.
Representative output from a quadruped throwing policy — PPO on 2,048 parallel environments.
End-to-end systems from reward design to deployed inference. Figures report best validated metrics to date.
| Ref | Project & method | Result |
|---|---|---|
| R-01 | Dynamic arm throwing → PPO policy for high-velocity throwing — kinematic-chain whip, phase clamping, 3→10 m curriculum, velocity-triggered release, FK collision avoidance. | 12 m/s Release speed |
| R-02 | Agent autonomy stack → Voice-commanded robot control with vision-language models — person tracking, ASR, GPS fusion, and a tactical bridge on edge compute. | 8 B Parameters |
| R-03 | Persistent object tracking → Real-time multi-object tracking with a four-state machine, sparse optical flow, and fisheye distortion correction. | 60 fps Tracking rate |
| R-04 | Edge LLM inference → Multi-machine local model serving with quantized GGUF models and speculative decoding for robot command interpretation. | 125 GB Unified memory |
| R-05 | GRPO policy optimization → Group Relative Policy Optimization with parameter-efficient fine-tuning, converging in under fifteen minutes on edge hardware. | 14 min Train time |
Foundational and ongoing work, published openly. The full portfolio site collects write-ups, notebooks, and demos.
Full portfolio & project write-ups — bmaxdk.github.io
Visit GitHub Pages ↗Native tools built alongside the research. First up: TermX — SSH terminal, files & tunnels, with the whole fleet in your pocket.
A fast, native SSH client for iPhone and iPad. Run a real terminal, move files over SFTP, forward ports, and watch live video — all over one encrypted connection. Credentials and keys stay in the Keychain behind Face ID, and the app collects no data.
From policy design through deployed inference — every layer owned and validated.
PPO, SAC, and GRPO with GPU-parallel environments. Curriculum learning, domain randomization, and reward shaping built in.
Domain randomization and system ID validated on hardware.
Detection, tracking, and pose estimation for edge deploy.
On-device LLM inference for command parsing.
ROS 2, sensor fusion, behavior trees, distributed compute.
CUDA optimization, quantization, real-time tuning.
Heterogeneous GPU infrastructure spanning simulation, inference, and edge deployment.
Desktop GPU, 16 GB VRAM, 128 GB system. Primary simulation training.
CUDA 12.8 · x86_64125 GB unified memory for large-model inference and fine-tuning.
CUDA 13.0 · ARM6464 GB unified, ROS 2. Primary robot deployment platform.
ARM64 · DockerDesktop AI supercomputer with 3.67 TB storage.
NVIDIA DGXOpen to research collaboration, consulting, and hardware deployments in robotics and reinforcement learning.