Media Streaming Spec
This specification defines how video is encoded, segmented, stored, and delivered for real-time screen sharing between Sentinels and Proctors.
This spec was created by iterating with GPT 5.2. The chat can be found in the repository root at this commit a46981632a5e3c2a9bac8f540a6cefa1d06d4082.
Overview
| Aspect | Decision |
|---|---|
| Codec | H.264 |
| Container | Fragmented MP4 (fMP4) |
| Browser Delivery | Media Source Extensions (MSE) |
| Fragment Duration | Variable, short (implementation choice for real-time delivery) |
| Keyframes | On-demand + max 20-30s interval + on FPS change |
| Memory Buffer | All fragments from last 15-20 seconds (configurable) |
| Disk Storage | All fragments written as-is |
| Resolution | Max 1080p, downscaled preserving aspect ratio |
| Framerate | 1/5 fps to 5 fps, variable over time |
| Live Transport | WebSocket (server pushes fragments) |
| Historical Transport | HTTP (Proctor fetches from disk) |
Components
- Sentinel: Captures screen, encodes video, sends fragments to Server
- Server: Receives fragments, buffers in memory, writes to disk, relays to Proctors
- Proctor: Receives fragments, decodes via MSE, displays video
Documentation
Definitions: fragments vs keyframes vs init
H.264 codec settings and framerate
fMP4 structure and initialization
Fragment cadence and join fragments
Server-side buffering for live streams
Persistent fragment storage
In-stream and application metadata
WebSocket and HTTP delivery
How Proctors join a stream
Keyframe and FPS change requests
Last updated on • J.H.F.