\r\n
There are few standards for low latency HTTP – the main leading standards are LL-Dash and Apple LL-HLS. These standards advertise latency in the two to five seconds range, but, of course, end to end latency can vary. These streaming protocols are interoperable at the segment level, where Common Media Application Format (CMAF) is used for segmented media. In theory, this applies to byte-range support as well (in practice, not so much – yet). Pioneers in the low latency space such as Will Law have proven this flow, enabling a unified CDN cache cross protocols. That said, two protocols have very distinct representations at the manifest level and in the case of Apple a distinct request and response flow is required. Finally, these technologies were recently introduced so there are some sharp edges, and real world players in OTT, for example, don’t always operate as advertised. While there are several commercial providers on the encoder and player side, we wanted to have tight control over the tech stack to minimize latency from our WebRTC ingest pipeline, and facilitate multi-protocol, multi-CDN delivery.
\r\n\r\nCMAF Origin Architecture
\r\nWe built the CMAF origin project and deployment profile to match the rest of our services – i.e Golang microservice and OpenShift Kubernetes for deployment orchestration. The basic architecture of CMAF origin bifurcates reader and writer deployment profiles against a single code base with a shared cache redis cluster for hosting the chunk fragments. Redis operated under Crossplane with an Elasticache cluster (in our setup) provides a high performance, high availability, low latency location to store chunk fragments as soon as they arrive from the encoder. The split deployment allows for distinct security profiles and enables the reader and writer to scale independently. We have a clean set of roles and permissions whereby upstream CDNs only have access to the reader and our WebRTC ingest / ffmpeg nodes the writer. HLS and Dash are both served dynamically from the base xml manifest that ffmpeg’s low latency Dash chunked transfer encoding mode pushes via the “writer” service. To quickly generate your own version of these low latency ffmpeg command line calls check out an awesome tool by Peter Chave for ffmpeg command line configuration.
\r\n\r\nManaging Open Connections
\r\nThe reader is responsible for keeping connections open as the encoded frames arrive in the Redis cache by way of the writer. In the case of the Apple protocol, the reader keeps the connection open for the manifests, and in the case of the dash delivery protocol, the segment request is kept open and frames are pushed via chunk transfer as they arrive. In both cases, we have to manage timeouts and effective usage of redis queries relative to push data over the HTTP/2 channel at precisely the right moment ,while using resources effectively for high parallelism in streaming and consumption. We found it more efficient where we have a known time interval of per-frame chunks being pushed to use pooling, rather than the pub/sub model. Because we can have high confidence of getting a frame or two on set interval, we can avoid the chattiness of a notification channel push that just notifies us of a frame query we know we would have to make anyway. It’s worth noting that this intentional longer running connection can make traditional service level latency monitoring not very useful – to address this, we track application runtime via custom logging attributes.
\r\n\r\nSegment Reuse & Unified VOD(Video on Demand) capabilities
\r\nThe segments are persisted to Amazon S3 in-real-time, facilitating live VOD clipping (while the stream is still active), instant live to vod flows, as well as simulated live capabilities. The CMAF format in theory enables unified HTTP CDN cache cross HLS and Dash profiles across these secondary use cases. A live stream would ideally warm the cache for replays, catchup, and clips, which represent a significant usage profile of the event consumption life cycle for a given event. This, in turn, improves cache hit rates which both reduces midgress costs and improves end user experience. Unfortunately, this dream is still in development – the latest releases of Apple players do seem to support it, but we found scattered support across the LL-HLS ecosystem. We hope to revisit this effort soon.
\r\n\r\nGiven the lack of consistent support for byte ranges in Apple devices LL-HLS implementation for our initial deployment, we have used the “named parts'' support of the LL-HLS spec and just map parts to the same byte transfer logic within our cmaf-reader service.
\r\n\r\nHow do I try this out?
\r\nWhen using the Caffeine app, we include an option for users to select “non-real time”. When you select this function, it will consume the low latency HTTP stream. The non-realtime is a few seconds behind the WebRTC stream. In future posts, we will detail our adventures in bringing this low latency stream to OTT devices and some initial findings there.
Hope you enjoyed this post – If you're interested in the low latency video stream space or social live video in general, do check out our openings.
","authors":[{"author_name":"Michael Dale & the CDN Team","author_link":"https://www.linkedin.com/company/caffeine.tv/mycompany/"}]}}]}