Matterport and Facebook Release Open-Source Spatial Dataset to Train Robots and AI

Companies claim that HM3D is the world's largest spatial dataset for academic research, with digital twins of 1,000 real-world sites.

Matterport has digitized 1,000 real-world spaces for the HM3D library.

Developers need large amounts of reliable data to train artificial intelligence and robots. There are several libraries of objects, but fewer of spaces. Matterport Inc. today said that it and Facebook AI Research are releasing the “largest-ever dataset of 3D indoor spaces available exclusively for academic, non-commercial uses.” 

Matterport said the Habitat-Matterport 3D Research Dataset (HM3D) spatial library includes high-resolution digital twins of real-world environments. The Sunnyvale, Calif.-based company claimed that it has digitized more than 10 bilion square feet (9.2 billion square meters) of sites in more than 150 countries.

“Until now, this rich spatial data has been glaringly absent in the field, so HM3D has the potential to change the landscape of embodied AI and 3D computer vision,” said Dhruv Batra, a research scientist at Facebook AI Research (FAIR). “Our hope is that the 3D dataset brings researchers closer to building intelligent machines, to do for embodied AI what pioneers before us did for 2D computer vision and other areas of AI.”

“Our engagement with Matterport is specific to the development of this dataset which we’ve been working on for a little over a year,” Batra told Robotics 24/7. “HM3D is being used specifically by Facebook AI researchers for fundamental research in embodied AI, 3D computer vision, and robotics.”

HM3D to advance indoor scene reconstruction

HM3D is free and available now. Researchers can use it with FAIR’s Habitat 2.0 simulator to train embodied agents, such as home robots and AI assistants, at scale. Matterport and FAIR said HM3D “is a foundational step towards helping these agents navigate through real-world environments and better understand the variations of spaces such as bedrooms, bathrooms, kitchen, and hallways, as well as the different configurations of those rooms within every structure.”

“It can also assist robots in recognizing how objects within rooms are typically arranged so that instructions are correctly understood,” said the companies. “This research could one day be used in production applications like robots that can retrieve medicine from a bedroom nightstand or AR glasses that can help people remember where they left their keys.”

“We’ve been challenged with a lack of spatial data to advance innovation in real estate, construction, robotics, augmented reality, and more,” said Yasutaka Furukawa, associate professor of computing science at Simon Fraser University. “But with the HM3D dataset from the collaboration between Matterport and Facebook AI, we're excited about the significant progress we'll make in advancing research in indoor scene reconstruction, generation, and analysis at a house scale for the first time.”

In February, Matterport announced plans to merge with special-purpose acqusition company (SPAC) Gores Holdings VI for $640 million, leading to a valuation of $2.3 billion. RJ Pittman, CEO of Matterport, replied to the following questions:

We've seen open-source libraries of objects for training robots in simulation—why hasn't there been a spatial one until now?

Pittman: Historically, the vast majority of open-source libraries of environments have been synthetic environments. Matterport models represent the most realistic representation of real-world environments, including all of the entropy that comes with the real world.

To gauge interest in the community, Matterport previously released a dataset for academic, non-commercial use in 2017 comprising 90 Matterport spaces. The demand we’ve seen in spatial data, and specifically of Matterport environments, has been overwhelmingly positive. The H3MD dataset represents Matterport’s further commitment to supporting the academic and research community in the further usage of spatial data to spur innovation.

Does the Habitat library focus on generic homes and public spaces, or does it include spaces such as warehouses, factory floors, or hospitals?

Pittman: The H3MD dataset is composed of 1,000 residential and commercial spaces that are high-resolution, data-rich digital twins of various styles, sizes, and complexities from across the world. 

How easy will it be for users to modify the digital twins for variability and testing edge cases?

Pittman: The beauty of Matterport's dataset is that you don't need to modify them to enable variability and testing edge cases since the dataset is so expansive and captures ground truth data preserved from real physical spaces. This means that the Matterport's dataset uniquely captures the natural variability and diversity of the real world, which lends itself particularly well to edge cases that you wouldn't otherwise be able to easily just in a simulation.

If Matterport is offering this library for free, what does it expect in return? Is the dataset still growing?

Pittman: Our community plays a big role in supporting advancements in data science, to help unlock insights and use cases never seen before. We’re always looking ahead to make the world a better place through the use of spatial data for innovation and make the Matterport digital twin more valuable by making it easily accessible.

With this dataset, our goals are twofold. First and foremost, we wanted to collaborate with Facebook to support the academic and research community in developing new technologies and innovation in spatial data.

Second, by introducing early on the next generation of AI talent currently in academia to Matterport data, we hope to increase the awareness and familiarity of Matterport data which will pay dividends once they matriculate.

Matterport is going public; how will that affect the company?

Pittman: This transaction will give us the capacity to accelerate our growth strategy and extend our leadership position in a number of important ways. We are delivering new products for our customers, like Matterport for iPhone and Android, expanding our third-party platform and developer ecosystem, and scaling globally.

Matterport has been digitizing the built world for a decade, and we are very well-positioned to capitalize on the enormous opportunity ahead.

About the Author

Eugene Demaitre's avatar
Eugene Demaitre

Eugene Demaitre is editorial director of Robotics 24/7. Prior to joining Peerless Media, he was a senior editor at Robotics Business Review and The Robot Report. Demaitre has also worked for BNA (now part of Bloomberg), Computerworld, and TechTarget. He has participated in numerous robotics-related webinars, podcasts, and events worldwide.

    Follow Robotics 24/7 on FaceBook


Email Sign Up

Get news, papers, media and research delivered
Stay up-to-date with news and resources you need to do your job. Research industry trends, compare companies and get market intelligence every week with Robotics 24/7. Subscribe to our robotics user email newsletter and we'll keep you informed and up-to-date.



Robot Technologies