Three ARKit patents come to light regarding systems for generating highly detailed floorplans based on Room Scanning

Today the US Patent & Trademark Office published a patent application from Apple that generally relates to generating two-dimensional and three-dimensional geometric representations of physical environments, and in particular, to systems, methods, and devices that generate geometric representations based on information detected in physical environments.

The technology relates to Apple’s ARKit. Below is a snippet from a WWDC presentation that at one point touches on the topic of “Scene Geometry” covering LiDAR scanning using the iPad Pro. It covers subject matter relevant to today’s three new patent applications.

Apple notes in their patent background that floorplans play an important role in designing, understanding, and remodeling indoor spaces. Floorplans are generally effective in conveying geometric and semantic information of a physical environment. For instance, a user may view a floorplan to quickly identify room extents, wall structures and corners, the locations of doors and windows, and object arrangements.

There are numerous hurdles to providing computer-based systems to automatically generate floorplans, room measurements, or object measurements based on sensor data. The sensor data obtained regarding a physical environment (e.g., images and depth data) may be incomplete or insufficient to provide accurate floorplans and measurements. For example, indoor environments often contain an assortment of objects, such as lamps, desks, chairs, etc., that may hide the architectural lines of the room that might otherwise be used to detect edges of a room to build an accurate floorplan. As another example, images and depth data typically lack semantic information and floorplans and measurements generated without such data may lack accuracy.

Existing techniques do not allow for automatic, accurate, and efficient generation of floorplans and measurements using a mobile device, for example, based on a user capturing photos or video or other sensor data while walking about in a room. Moreover, existing techniques may fail to provide sufficiently accurate and efficient floorplans and measurements in real time (e.g., immediate floorplan/measurement during scanning) environments.

Apple’s invention covers devices, systems, and methods that generate floorplans and measurements using three-dimensional (3D) representations of a physical environment.

The 3D representations of the physical environment may be generated based on sensor data, such as image and depth sensor data. The generation of floorplans and measurements is facilitated in some implementations using semantically-labelled 3D representations of a physical environment.

Some implementations perform semantic segmentation and labeling of 3D point clouds of a physical environment. Techniques disclosed herein may achieve various advantages by using semantic 3D representations, such as a semantically labeled 3D point cloud, encoded onto a two-dimensional (2D) lateral domain. Using semantic 3D representations in 2D lateral domains may facilitate the efficient identification of structures used to generate a floorplan or measurement.

A floorplan may be provided in various formats. In some implementations, a floorplan includes a 2D top-down view of a room. A floorplan may graphically depict a boundary of a room, e.g., by graphically depicting walls, barriers, or other limitations of the extent of a room, using lines or other graphical features. A floorplan may graphically depict the locations and geometries of wall features such as wall edges, doors, and windows. A floorplan may graphically depict objects within a room, such as couches, tables, chairs, appliances, etc. A floorplan may include identifiers that identify the boundaries, walls, doors, windows, and objects in a room, e.g., including text labels or reference numerals that identify such elements. A floorplan may include indications of measurements of boundaries, wall edges, doors, windows, and objects in a room, e.g., including numbers designating a length of a wall, a diameter of a table, a width of a window, etc.

According to some implementations, a floorplan is created based on a user performing a room scan, e.g., moving a mobile device to capture images and depth data around the user in a room. Some implementations provide a preview of a preliminary 2D floorplan during the room scanning. For example, as the user walks around a room capturing the sensor data, the user’s device may display a preview of a preliminary 2D floorplan that is being generated.

The preview is “live” in the sense that it is provided during the ongoing capture of the stream or set of sensor data used to generate the preliminary 2D floorplan. To enable a live preview of the preliminary 2D floorplan, the preview may be generated (at least initially) differently than a final, post-scan floorplan.

In one example, the preview is generated without certain post processing techniques (e.g., fine-tuning, corner correction, etc.) that are employed to generate the final, post-scan floorplan. In other examples, a live preview may use a less computationally intensive neural network than is used to generate the final, post-scan floorplan. The use of 2D semantic data (e.g., for different layers of the room) may also facilitate making the preview determination sufficiently efficient for live display.

Apple’s patent FIG. 1 below is a block diagram of an example operating environment (#100). In this example, the operating environment 100 illustrates an example physical environment (#105) that includes walls (#130, 132, 134), a chair (#140), a table (#142), a door (#150) and a window (#152). The server (#110) is configured to manage and coordinate an experience for the user.

Apple’s patent FIG. 4 above presents a system flow diagram of an example generation of a semantic three-dimensional (3D) representation using 3D data and semantic segmentation based on depth and light intensity image information.

The system flow of the example environment #400 above can be displayed on a device that has a screen for displaying images and/or a screen for viewing stereoscopic images such as a head-mounted display (HMD).

Apple’s patent FIG. 5 below is a flowchart representation of an exemplary method (#500) that generates and displays a live preview of a preliminary 2D floorplan of a physical environment based on a 3D representation of the physical environment.

Apple’s patent FIG. 10 below is a system flow diagram of an example environment (#1000) in which a system can generate and provide for display a 2D floorplan of a physical environment based on a 3D representation (e.g., a 3D point cloud, a 3D mesh reconstruction, a semantic 3D point cloud, etc.) of the physical environment.

The system flow of the example environment (#1000) acquires image data (e.g., live camera feed from light intensity camera) of a physical environment (e.g., the physical environment #105 of FIG. 1), a semantic 3D representation from a semantic 3D unit and other sources of physical environment information (e.g., camera positioning information) at the floorplan unit.

Apple’s patent FIG. 12B above is a system flow diagram of an example environment (#1200B) in which an object detection unit (#1220) can generate refined bounding boxes for associated identified objects based on a 3D representation of the physical environment, and a floorplan measurement unit (#1250) can provide measurements of said bounding boxes.

The 3D data, light intensity image data, proposed bounding boxes (#1225a, 1225b), and the stage 1 output are obtained by the fine-tuning stage 2 neural network (#1234) that generates refined bounding boxes using high precision/low recall neural network to refine the accuracy of the generated features and output refined bounding boxes (#1235a and 1235b, e.g., table #142 and chair #140, respectively). As illustrated in FIG. 12B, the refined bounding boxes are more accurate than the bounding boxes respectively.

Apple’s three patent applications listed below will be better appreciated by ARKit developers:

01: 20210225043 – FLOORPLAN GENERATION BASED ON ROOM SCANNING

02: 20210225090 – FLOORPLAN GENERATION BASED ON ROOM SCANNING

03: 20210225074 – MULTI-RESOLUTION VOXEL MESHING