Shadi Ashnai

Computational Video Premieres in Wolfram Language 12.1

2020年5月19日 -Shadi Ashnai,声音和图像,算法R&d经理

Version 12.1在Wolframanbet万博appm语言介绍的期待已久Videoobject. TheVideoobject is completely (and only)out-of-core; it can link to an extensive list of video containers with almost anycodec. Most importantly, it is bundled with complete stacks forimageandaudio processing,machine learningandneural nets,统计andvisualization和许多更多的功能。这已经使得Wolfram语言强大的视频计算平台manbet万博app,但仍有更多的功能探索。

The Video Object

A video file typically has a video and an audio track. Here is aVideo对象链接到一个视频文件:



在12.1版本,by default, theVideoobject is displayed as a small thumbnail and can be played in an external player. There are other appearances to enable in-notebook players, like theVideo一个基本的播放器对象:


视频[ “ExampleData / Caminandes.mp4”,外观 - > “基本”]

Now you can inspect theVideoobject:


Duration[Video["ExampleData/Caminandes.mp4", Appearance -> Automatic, AudioOutputDevice -> Automatic, SoundVolume -> Automatic]]

Information[ Video["ExampleData/Caminandes.mp4", Appearance -> Automatic, AudioOutputDevice -> Automatic, SoundVolume -> Automatic]]

Most video containers support multiple video, audio and subtitle tracks. Having multiple audio or subtitle tracks in a single file is more common than having more than one video track.

这是一个的例子Videoobject linking to a file with multiple audio and subtitle tracks:



Accessing Parts of a Video

There are several parts of a video you may be interested in extracting. UseVideoFrameListandVideoExtractFramesto extract specific video frames. You can also useVideoFrameListto sample the video uniformly or randomly with框架:


VideoFrameList [视频[ “ExampleData / Caminandes.mp4”,外观 - >自动,AudioOutputDevice  - >自动,SoundVolume  - >自动],3]

Use this function to create a thumbnail grid (a group of smaller images that summarizes the whole video):


VideoFrameList [视频[ “ExampleData / Caminandes.mp4”,外观 - >自动,AudioOutputDevice  - >自动,SoundVolume  - >自动],12] // ImageCollage

You can also trim a segment of a video:


VideoTrim[ Video["ExampleData/Caminandes.mp4", Appearance -> Automatic, AudioOutputDevice -> Automatic, SoundVolume -> Automatic], {30, 60}]



Audio[Video["ExampleData/Caminandes.mp4", Appearance -> Automatic, AudioOutputDevice -> Automatic, SoundVolume -> Automatic]]


Performing Analysis


Compute the mean color of each frame over time:


VideoTimeSeries[Mean, Video["ExampleData/Caminandes.mp4", Appearance -> Automatic, AudioOutputDevice -> Automatic, SoundVolume -> Automatic]] // ListLinePlot[#, PlotStyle -> {Red, Green, Blue}] &

Count the number of objects (cars, for example) detected in each frame of a video:

V =视频

V =视频["http://exampledata.wolfram.com/cars.avi"];
TS = VideoTimeSeries

TS = VideoTimeSeries[Point[ImagePosition[#, Entity["Word", "car"]]] &, v]

Plot the number of objects (again, using cars as an example) detected in each frame:


TimeSeriesMap [长度@@#&,TS] // ListLinePlot

Highlight the position of all detected objects (cars) on a sample frame:


HighlightImage[ VideoExtractFrames[v, 1], {AbsolutePointSize[3], Flatten@Values[ts]}]

We can also use the multiframe version of the function to perform any analysis that requires multiple frames.


V =视频

V =视频[ “Musician.mp4”]
diffs = VideoTimeSeries

diffs = VideoTimeSeries[ImageDistance @@ # &, v, Quantity[2, "Frames"], Quantity[1, "Frames"]]

ListLinePlot[diffs, PlotRange -> All]
times = FindPeaks

times = FindPeaks[diffs, Automatic, Automatic, 150]["Times"]

VideoExtractFrames[v, Prepend[times, 0]]

Process a Video

The Wolfram Language already included a variety of image and audio processing functions.VideoFrameMapis a function that takes one frame or a list of video frames, filters them and writes them to a new video file. Let’s use the bullfinch video:

V =视频[ V =视频[

V =视频["ExampleData/bullfinch.mkv"]; VideoFrameList[v,3]

We can start with a color negation as a simple “Hello, World!” example:


VideoFrameMap[ColorNegate, v] // VideoFrameList[#, 3] &

Or posterize frames to create a cartoonish effect:

f = With

f = With[{tmp = ColorQuantize[#, 16, Dithering -> False]}, tmp - EdgeDetect[tmp]] &;

VideoFrameMap[f, v] // VideoFrameList[#, 3] &

Use a neural net to perform semantic segmentation on the previously used video of cars:

V =视频

V =视频["http://exampledata.wolfram.com/cars.avi"];

segment[img_] := Block[{net, encData, dec, mean, var, prob}, net = NetModel["Dilated ResNet-38 Trained on Cityscapes Data"]; encData = Normal@NetExtract[net, "input_0"]; dec = NetExtract[net, "Output"]; {mean, var} = Lookup[encData, {"MeanImage", "VarianceImage"}]; Colorize@ NetReplacePart[ net, {"input_0" -> NetEncoder[{"Image", ImageDimensions@img, "MeanImage" -> mean, "VarianceImage" -> var}], "Output" -> dec}][img]]

VideoFrameList[VideoFrameMap[segment, v], 3]

Next is a video stabilization example, which is a vastly simplified version of this Version 12.0product example. The input video is another pick frompixabay:

V =视频

V =视频[ “soap_bubble.mp4”]

Here is the mask over the ground to make sure the shaking soap bubble movement does not affect our stabilization algorithm:

mask = CloudGet

mask = CloudGet["https://wolfr.am/Mt580rl0"];


f = Identity;

f = Identity; VideoFrameMap[ Module[{tmp}, tmp = Last@ FindGeometricTransform[##, TransformationClass -> "Rigid"] & @@ ImageCorrespondingPoints[Sequence @@ #, Sequence[ MaxFeatures -> 25, Method -> "ORB", Masking -> mask]]; f = Composition[tmp, f]; ImagePerspectiveTransformation[#[[2]], f, Sequence[ DataRange -> Full, Padding -> "Fixed"]]] &, v, Quantity[2, "Frames"], Quantity[1, "Frames"]];

From Manipulate to Video


This is a操纵来自Wolfram Demonstrations Project:

米= ResourceData [“演示项目:日夜世界时钟”]

And a video generated from it:



A video can also be generated from a操纵and aSoundorAudioobject:


出口["file.mp4", {"Animation" -> m, "Audio" -> ExampleData[{"Audio", "PianoScale"}]}, "Rules"] // Video


The Wolfram Language by default uses the operating system as well as a limited version ofFFmpegfor decoding and encoding a large number of multimedia containers and codecs.$VideoEncoders,$VideoDecoders,$AudioEncoders等列表支持的编码器和解码器。

Codec support can be expanded even further by installing FFmpeg (Version 4.0.0 or higher). This is the number of decoders and the list of MP4 video decoders on macOS with FFmpeg installed:

Length /@ $VideoDecoders

Length /@ $VideoDecoders

$VideoDecoders["MP4"][[All, 1]]

More to Come

在Wolfram语言视频的计算只在起步阶段。manbet万博app这里展出的新功能,只是一个已经强大的集合的一部分video basics, and we are actively designing and developing updates to existing functions and additional capabilities for future versions, with machine learning and neural net integration at the top of the list. Let us know what you think in the comments—bugs, suggestions and feature requests are always welcome.


