Combined Per Pixel image sequences can be encoded in any video editing tool. However, some video editing tools may have resolution limitations or can exports with unintended color conversions. As a result, the following is a set of recommendation for FFmpeg, which is recommend for maximum control over your encoding parameters.
This process is optional, but recommended for high-sensor-count clips which need to playback in realtime.
Depthkit Studio currently exports multi-perspective Combined Per Pixel images sequences with each sensor perspective laid out in a single row. For those exporting with more than three perspectives, this layout may not be ideal for your publishing platform.
This process reformats your clip and metadata into stacked rows and columns, creating an overall aspect ratio which better fits within the constraints of some video codecs and graphics hardware, leading to higher resolution and better performance.
- Combined Per Pixel image sequence with metadata file
- Download and install Python 3+ →
- DepthkitCPPToRows.py script →
- Download the DepthkitCPPToRows.py script, and save it somewhere handy.
- Open a command line interface.
- At the start of the command, call Python by either typing "python" or "py". If this is unrecognized, add your installation of Python to Windows Path by following this guide.
- Leave a space, then drag and drop the DepthkitCPPToRows.py Python script.
- Leave a space, then drag and drop your metadata file.
- Leave a space, then drag and drop the image sequence folder.
- Leave a space, then drag and drop a new folder for the export directory. This can be named anything you like.
python DepthkitCPPToRows.py 'D:\Depthkit\Assets\Depthkit_Project\Depthkit_metadata.txt' 'D:\Depthkit\Assets\Depthkit_Project\Depthkit_PNG_Sequence' 'D:\Depthkit\Assets\Depthkit_Project\Export_Directory'
- Press enter to run the command.
- You will see the new image sequence and metadata file write to the export directory. Use this sequence and metadata in place of those exported from Depthkit.
--help: show this help message and exit;
--threads THREADS: number of threads (default is 8); check the specifications of your CPU to see how many threads it can support;
--rows ROWS: number of rows (default is 2).
FFmpeg is a powerful command-line tool that allows for very specific parameters to be set during encoding. These parameters are critical to the performance and quality of the assets once they are played back in Unity. Minute changes in color space or aspect ratio metadata can be the difference between an immaculate reconstruction and one riddled with distortion and artifacts.
For an introduction to FFmpeg, explore the documentation below.
FFmpeg Documentation portal →
We recommend adding FFmpeg to Windows Path as described in this guide so that you don't have to jump around to different directories in the command line interface.
The following examples assume that you will be starting from a Combined Per Pixel image sequence as the input to FFmpeg, which preserves the most quality in the resulting asset.
ffmpeg -r 30 -f image2 -start_number 0 -i source_img_sequence_prefix_%06d.png -i audio_file.wav -map 0:v -map 1:a -c:v libx264 -x264-params mvrange=511 -c:a aac -b:a 320k -shortest -vf scale='min(4096,iw)':'min(ih,4096)':force_divisible_by=2:out_color_matrix=bt709:out_range=full,setsar=1:1 -colorspace bt709 -color_primaries bt709 -color_trc bt709 -color_range pc -b:v 5M -pix_fmt yuv420p output.mp4
-r 30(framerate) - interprets the source image sequence at 30 frames per second to match your Depthkit footage.
-f image2(format) - interprets the source as an image sequence.
start_number 0- specifies which image in the sequence to start on. This can be found in the filename of the first image in your image sequence.
-i source_img_sequence_prefix_%06d.png- uses only frames which begin with
source_img_sequence_prefix_(replace this with the filename prefix of your image sequence), and end with a
%06d(6-digit zero-padded number) and
-i audio_file.wav- specifies the source of the audio file. The beginning of the audio file must be trimmed to match the start of the video, and the duration of the audio clip must at least as long as the video to ensure the CPP video doesn't get trimmed in the process.
-map 0:v -map 1:a- map the first
-isource to the video channel, and the second
-isource to the audio channel. Be sure to arrange your input options to map to the channels properly.
-c:v libx264(codec) - encodes using the
libx264(H.264) codec. This can be changed to other codecs like
libx265(H.265/HEVC) codec depending on your target publishing platform.
-b:v 5M(bitrate) - encode with a target bitrate of
5Mbps. You can adjust this based on your preference to balance file size and quality.
Using constant bitrate with
-crf 15(constant rate factor)
An alternative to encoding a variable bitrate with
-b:v (x)Mis to use a constant bitrate with
-crf 15(constant rate factor), which targets uniform quality across the clip. It ranges from 0–51 (0 is lossless and 51 is heavily compressed) with a default of 23. We recommend 15 for a high quality asset, but dialing this down will reduce file size if needed.
-pix_fmt yuv420p- applies the YUV 4:2:0 pixel format, which reduces file size.
-x264-params mvrange=511- For H.264 videos, this prevents artifacts generated when the motion vector range (mvrange) exceeds the H.264 level 5.2 specification [-2048, 2047]. Keep this value at or under
511to stay within spec. See this documentation from Oculus for further information.
Use these only if embedding audio.
-c:a aacsets the resulting audio codec to AAC, but this can be codec any supported by FFmpeg.
-b:a 320ksets the audio bitrate to 320 kbps. You can adjust this based on your preference to balance file size and quality.
-shorteststops the output at the end of the shorter source
Downscaling your Combined Per Pixel video files
You may need to downscale your Combined Per Pixel video files due to hardware decoding resolution limitations on your computer or target device. In the example below, we are downscaling to 4096 to fit within hardware NVDEC resolution limitations, but you may need to specify a lower maximum resolution, such as 2048 for mobile.
-vf scale='min(4096,iw)':'min(ih,4096)'(video filters) - scale the video to maximum width of
4096, a maximum height of
4096. The image aspect ratio doesn't need to be maintained, so this command scales the width and height independently to fit within your desired target resolution.
setsar=1:1forces a pixel aspect ratio of
1:1which, when scaling, is necessary to ensure the best performance in Unity.
When resizing a Depthkit clip, you must also edit your metadata file to reflect this change in resolution.
Metadata Editing Instructions. To edit these values, open your exported metadata file in a text editor and scroll to the very bottom of the file.
Edit the textureHeight and textureWidth to match the resolution of your re-encoded clip. If you have calculated the height automatically, verify the video resolution by right clicking the video file, selecting properties, and viewing the frame height/width under Details.
Color spaces in FFmpeg
By default, FFmpeg exports video in the BT.601 color space, which will subtly shift the colors of the of the Combined per Pixel video, and cause the geometry to misalign in the Unity renderer. Include all color matrix and color space options to ensure that the exported videos are encoded and stamped using BT.709 standards.
FFmpeg only honors one instance of the
-vfoption per command, so use
:to merge multiple settings into one command.
out_range=fullspecify to use full range color (also called PC color) as opposed to limited, (aka TV) range color. This ensures that the color ranges for depth remain full and undistorted. Read more about FFmpeg color space here.
-color_trc bt709, and
-color_range pcinject color space metadata into the output file so that the resulting video is interpreted properly by the player.
Using FFprobe to determine video Color Space and Color Range
If you are uncertain about the color range of a video already encoded and want to determine if it has encoded the proper color range, you can use FFprobe command
FFprobe is a tool that comes along with FFmpeg for reading data from existing video files. If you've received a Combined Per Pixel video file, use the following command to see its encoding settings and metadata.
ffprobe -hide_banner -select_streams v:0 -show_streams <videofile>.mp4
You will see a dump of metadata about the video. The Stream information near the end of the dump should look like this, most importantly the color format contains both PC and BT.709 encoding.
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), **yuvj420p(pc, bt709),** 4096x2560 [SAR 1:1 DAR 8:5], 5071 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (default)
Another way to see this same information is if you scroll up on the dump from the video to the sections on color space and range, you should expect to see:
Incorrectly encoded videos will look something like this:
Quickly scrubbing H.264 videos requires frequent I-Frames, which are useful when setting reconstruction settings or blocking. By default FFmpeg places them at most 250 frames apart, which leaves large gaps on the timeline where the player doesn't have a complete frame to reference. Add I-Frames to every frame by adding
-g 1 to the FFmpeg command.
Making each frame into an I-Frame dramatically increases the file size, so we recommend encoding multiple versions of your asset:
- Low-Resolution 'Proxy' - Encode this with an I-Frame every frame, but smaller dimensions for a working asset that scrubs quickly to speed up the placement and timing of the asset in Unity.
- Full-Resolution Final Asset - Encode this at the final resolution, with only enough I-Frames to avoid artifacts. Once the asset is timed and placed in Unity and you don't need to scrub the timeline, replace the proxy version with this full-resolution asset for the final build.
-sc_threshold 99sets the "scene cut" threshold to 99, meaning that if 1% of the frame is different enough, it will be considered a different "scene" which needs to be cut to via inserting a new key frame.
-keyint_min 1 -g 30sets the minimum distance between any two keyframe is set to
1, and the maximum to
30. Meaning that the encoder is allowed to put subsequent keyframes next to one another, and must put at least one keyframe every thirty frames. If blocky artifacts persist even with these settings, reduce the
-gparameter until they are gone.
-c:v copyUse this in place of the
-c:v <codec>option to copy a pre-encoded video source without re-encoding, preserving the quality of the video. This can only be used when muxing an existing video stream into a new container; This cannot be used when encoding an image sequence to video, or transcoding one video to another.
-c:a copyUse this in place of the
-c:a <codec>option to copy the audio source without re-encoding. Use only if your audio source is already compressed to the codec and bitrate you want. This option to embeds the audio stream within the video with no changes.
Updated 11 days ago