One problem with the current implementation is if you bring up the browser (or a second browser) when the playback is already started it has no concept of where the current position is (and will actually appear paused at the start).
What I was thinking is having the master time position always coming from the server. The server would either maintain that time itself (calculated as time since playback started) or get it from MTC. This way all browsers would always display the same position and not have independent concept of current time.
I don’t think it’s too hard to implement, but would require selecting a second (output) MIDI port in the config file. (ie: a virtual MIDI cable in the opposite direction)
cantabile-media-server explicitly forces the video’s volume off. There’s no control over this atm - partly because it makes getting play/pause control of the browser’s video player easier. If the volume is on, most browsers will block attempts to automatically start playback - unless you tweak settings for that site and explicitly allow it.
Is there a use case for audio playback in setups like this?
So, getting a webrtc video feed into a browser isn’t trivial, but I did get something working.
I also found this tool: go2rtc. It lets you stream cameras directly to webrtc without the need for OBS/mediumtx. It also has much faster time between activating it and the video appearing.
The problem I’m facing now is that each server seems to have different requirements for establishing the WebRTC connection. (ie: getting a webrtc feed from mediumtx is different to go2rtc etc…)
Our band uses a video which contains audio as part of its closing. It’s not really a problem if the video volume is off as it is easy enough to extract the audio and play that as a separate media file. Unless they go out of sync.
I’ve just put up cantabile-media-server v0.0.5 which adds support for WebRTC camera feeds.
For this to work you need to have a WHEP compatible WebRTC server and you need to prefix the WHEP end-point url in the programList.txt with webrtc+. eg:
# program number to media file mapping
1: video1.mp4
2: webrtc+http://localhost:8889/camera1/whep
I’ve tested this with OBS to mediamtx and it works, but the feed can be quite slow to initially appear (see demo below).
I’ve also tested with go2rtc which gives much faster startup of the camera feed, but currently its WHEP implementation is incomplete so it doesn’t work with cantabile-media-server. I’ve logged an issue and hopefully it will get fixed soon as I think go2rtc is probably a better fit for this kind of setup.
Here’s a demo showing setup in OBS, starting mediamtx, config in cantabile-media-server and camera feed appearing in browser. It also shows the slow start time of mediamtx feeds.
Yesterday I put up a build of Cantabile with support for sending MIDI Time Code (MTC).
Today, I wrote the code in cantabile-media-server to listen for and track the MTC events sent by Cantabile and it keeps sync nicely - see here.
(Note Cantabile is displaying decimal fraction of a section while the media server is showing frames and qframes - hence what looks like a discrepancy but isn’t)
Next job is to figure out how to keep video in sync with it.
The big change here is the server now maintains a concept of current video position and tells the browsers where to start playback from. After the video is started it doesn’t do anything to keep things in sync (at least not yet).
In the demo I’m using video that shows current time. When Cantabile is paused the millisecond timestamp is a little off because MTC is only accurate to frame and it’s running at 30fps and of course the video doesn’t have frames for every millisecond. Basically it’s showing the closest frame.
Still a lot of rough edges to iron out, but I’m really surprised by how well this is working - even across wifi.
That looks like really good progress. I have been doing a lot of work outside of the studio - either rebuilding structures in the garden or refactoring montage.factory to support the Montage M (you probably know yourself that some jobs are too big/complex to try and do piece meal, so I tend to use chunks of time off of work for the really complex programming tasks) - so I have not ran the video server lately. Over the weekend I will grab the latest Cantabile and video server builds and see how it is going.
But this is certainly floating my boat in my simple video playback use case
Drift Compensation = server pings clients every 1 second with current time and clients update the playback rate of the video player to keep it sync and prevent drifting over time. You can see the calculated playback rate updating in the browser once per second while playing. If things get too out of sync (> 1 second) it just resyncs immediately to the correct position and carries on.
Latency Compensation = browser side setting (ie: per client) to adjust for any latency from Cantabile to browser. Causes the drift sync above to actually sync to a point ahead in time. In the demo you’ll see if I set latency comp to 1000ms then when playing the video is 1 second ahead of Cantabile. Setting is ignored when stopped or paused.
Still working on this so it’s only on the dev branch.
Sounds great progress. Although I said a simple trigger is good enough for me, if you have MTC sync working, I can see it being an even better solution for the tracks where I want a video cue precisely timed to a musical event, to the same precision I can achieve with the DMX cues.
Tonight is the first time I have looked at this since my original experimentation a few weeks back. I migrated my updated Cantabile Floyd Setlist (updated for the media server video selection/triggers, not SCS11 triggers) from my DAWPC onto my GIGPC to see if it also triggered the videos via the Cantabile-media-server on the Video PC via RtPMIDI. It did with no issues and no changes required.
I then updated to V0.5. I noted all the console diagnostic info had disappeared, which had I found useful. I checked the docs and noticed that there are now command line options to add this back in, but I also noted that certain ones like --version if included will report and terminate the server. Personally I think it would be better if they didn’t do that as it is intended for the server to run until it is stopped - or am I missing something in the setup? There is of course the question of how often you need this info once it is all stable and working, so not a biggie but just thought I would mention it to check my understanding.
Traditionally —version displays version number and quits. The idea is to check installation and version without requiring a valid configuration. I can make it display version as part of normal run too.
As for logging I’m still tweaking what’s shown normally vs in verbose mode. What in particular would you want shown all the time?
Given this is not meant to a program that quits, unless you press CTRL+C or close the command prompt, I felt the options like --version closing the program did not feel right (compared to command line software that is usually short lived), and what you had before you introduced the options was about right in terms of info reporting and logging.
Having said that. there is really no right or wrong way for this, and as I said above, maybe once you have a stable major release, there is no need for the info to be shown all the time, so your instincts and what you have done to control the logging via options is better long term.
Main thing for me is the main functionality that is maturing nicely.
The web browser places all three layers directly over each other. Tomorrow’s job is to add MIDI CC controls to allow hiding and showing individual layers.
A setup like the above lets you:
When camera layer is hidden and video player is stopped a static image is shown. (eg: the band logo)
When media layer is an image or a playing video it replaces the static background image.
When camera layer is shown, the middle video player layer continues to play in-sync in background.
ie: this setup would let you show a static image or a video and then switch seamlessly between it and a camera feed.
Perfect!! The only thing I’d add is that the camera layer should support multiple cameras (e.g. one facing the band or different band members at different times, one facing the audience), but perhaps that’s handled in the camera server. In any event, what I’m hoping to do is map some unused pads on my keyboard to each camera, so I can turn a camera on/off from the keyboard or via a binding to a transport position.
There’s a couple of ways to handle multiple cameras…
As you suggest, one way would be to control it at the feed source end - eg: using OBS or similar software to switch between different cameras. I think that would probably work best if you only ever need to show one camera feed at a time (ie: you don’t need different video displays showing different camera views).
The other way would be to just create additional layers, one for each camera and then hiding/showing the appropriate layers. The above example shows three layers, but you can create more of you need them.
Finally, I’m thinking of providing a way to have switchable media on multiple layers on the same channel. This requires an alternative mechanism for selecting the media since currently it’s using the MIDI channel’s program number and this would need multiple “programs” per channel. I have a couple of ideas here.
To be honest I’m not totally convinced about the camera support in this kind of setup because I’m not sure if the latency is tolerable and the delay in OBS->mediamtc feeds at startup might make switching between feeds janky. go2rtc is better in this regard but has other incompatibilities.
That said, the camera feed side of things is pretty much done. Once I get this next version out I’ll be keen for feedback.
Existing configurations should run without change - the only real difference being that ‘master’ time sync mode is the default whereas previous versions behaved in ‘none’ mode.