Total Timesteps
—
目标: 10,000,000
Episode Reward
—
Mean Reward per Episode
Training FPS
—
Steps per Second
KL Divergence
—
Policy Stability
📈 Reward Curve (实时)
📉 KL Divergence & Loss
🎮 One-Click Inference
加载训练好的 PPO 策略, 在 MuJoCo 中推理 500 步, 可视化关节轨迹
🦿 Joint Trajectory (推理结果)
🧪 Saved Experiments & Checkpoints
| Model Name | Size (MB) | Last Modified | Action |
|---|---|---|---|
| Loading... | |||
⚙️ System Environment
Platform—
Python—
Hostname—
SB3—
MuJoCo—
PyTorch—
NumPy—
Models—
🎯 Reward Function Breakdown
🏗 Technical Architecture
FastAPI
异步后端 + WebSocket
MuJoCo
物理仿真引擎
PPO + IL
强化+模仿混合架构
Plotly.js
交互式实时图表
CycloneDDS
实机通信协议
Unitree G1
29-DoF 人形机器人