两分钟论文 | 每日英语听力

未能成功加载，请稍后再试

0/0

显示字幕和译文显示字幕
只显示字幕显示译文
关闭字幕和译文关闭字幕
只显示译文关闭译文

Dear Fellow Scholars, this is Two Minute Papers with Károly Zsolnai-Fehér. This one is going to be absolutely amazing.

This piece of work is aimed to help a machine build a better understanding of images and 3D geometry. Imagine that we have a large database with these geometries and images, and we can search and compare them with arbitrary inputs and outputs.

What does this mean exactly? For instance, it can handle a text input, such as school bus and automatically retrieve 3D models, sketches and images that depict these kinds of objects.

This is great, but we said that it supports arbitrary inputs and outputs, which means that we can use the 3D geometry of a chair as an input, and obtain other, similar looking chairs from the database. This technique is so crazy, it can even take a sketch as an input and provide excellent quality outputs.

We can even give it a heatmap of the input and expect quite reasonable results. Typically, these images and 3D geometries contain a lot of information, and to be able to compare which is similar to which, we have to compress this information into a more concise description.

This description offers a common ground for comparisons. We like to call these embedding techniques.

Here, you can see an example of a 2D visualization of such an embedding of word classes. The retrieval from the database happens by compressing the user-provided input and putting it into this space, and fetching the results that are the closest to it in this embedding.

Before the emergence of powerful learning algorithms, these embeddings were typically done by hand. But now, we have these deep neural networks that are able to automatically create solutions for us, that are in some sense, optimal, meaning that according to a set of rules, it will always do better than we would by hand.

We get better results by going to sleep and leaving the computer on overnight than we would have working all night using the finest algorithms from ten years ago. Isn't this incredible?

The interesting thing is that here, we are able to do this for several different representations: for instance, a piece of 3D geometry, or 2D color image, or a simple word, is being embedded into the very same vector space, opening up the possibility of doing these amazing comparisons between completely different representations. The results speak for themselves.

下载全新《每日英语听力》客户端，查看完整内容

Shape2vec：使用人工智能来理解 3D 形状