Point cloud file formats and how to process them

There are a variety of point cloud file formats in use today. We look at some of the most widely used formats, and see how to process them.

The ever-declining cost of lidar devices, combined with their unique advantages over cameras, has led to wider adoption of Lidars in a variety of use-cases across various industries. However, compared to images produced by cameras, lidar outputs, called point clouds are not as accessible due to the lack of supporting software. Below we explore different point cloud file formats, and see how they can be accessed through code as well as simple to use free software.

What are Point clouds?

color_adjusted_nuscenes_pointcloud
A point cloud from the Nuscenes dataset. The colors have been imposed on the point cloud from camera images.

Point clouds are essentially a massive collection of data points in 3D space. A Lidar point cloud is created when an area is scanned using a LiDAR, or light detection and ranging device. In addition to lidars, point clouds are also created by other 3D scanning technologies like Photogrammetry and RADARs.

Imagine a room scanned by a super precise laser that measures the distance to every single point on the walls, floor, ceiling, and any objects inside. Each of these measurements is a data point, and all such data points together form a point cloud. At the very least, these measurements contain the X, Y and Z coordinate of every detected point. In addition, depending on the type of lidar device, each point can also have:

  1. Intensity values – Intensity values in a point cloud indicate how strongly light pulses sent by the lidar are reflected by the objects around it. A good way to think about this, is that highly reflective surfaces like street signs have high intensity values, and surfaces that absorb a lot of light (like roads) have low intensity values.
  2. RGB values – Sometimes you may also find RGB values assigned to each point. These can add additional information by coloring the point cloud.

You can learn more about lidar point clouds here.

There are different standards on how the information captured by lidar or other scanning devices is stored in files, giving rise to the different point cloud file formats being used today. Below we describe the most common point cloud file types used in the industry, and simple python code snippets on how to read them.

We will be using the Open3D and laspy python libraries to read and visualize the various formats. If you would just like to visualize the point clouds, you can use open source software like Cloud Compare to open files of some of the formats described below. If however, you want to annotate the point clouds using cuboids, or want to perform point cloud segmentation, we suggest a tool like Mindkosh’s Lidar annotation platform which is purpose built for point cloud annotation tasks. The platform supports intensity as well RGB point clouds with a large number of points. In addition, it supports sensor fusion use-cases where there may be camera images in addition to the point clouds.

BIN format

The bin format is not technically a format, since a bin file does not contain any information about what kind of data is present in it. Instead, it is a raw dump of the points captured by the lidar. In order to accurately read the bin file, you need prior information about the contents of the file, like what attributes are present and which data format they are saved in.

For e.g. if the points contain the x,y,z co-ordinates and intensity values and they are saved in 4-byte float data type, then the following code reads the file and creates an open3d point cloud from it.

import numpy as np

bin_data_raw = np.fromfile("bin_filename", dtype='float32'))
pointcloud_points = bin_data_raw.reshape(-1, 4)

PCD file format

pcd-format-header
A sample PCD file in ASCII format. The header ends at line 11, after which the numerical points data follows.

The PCD file format is the most widely used format. It comes in two flavors – ASCII and binary. ASCII PCD files can be easily opened in text editors to examine their content. BIN PCD files, on the other hand, cannot be examined in this way. BIN PCD files do have an advantage however - binary files are much smaller in size compared to ASCII PCD files.

Regardless of the particular PCD format, all PCD files contain a header and a body. The header is present at the beginning of the file, and describes what attributes are present in the main body.

Open source tools like Cloud Compare can directly open point cloud files in PCD format. PCD files can also be opened using the open3D python library. The following code snippet shows how you can read a PCD file and access its points and intensity values.

import open3d as o3d
import numpy as np

# Load the PCD file
raw_pointcloud_points = o3d.io.read_point_cloud("your_file.pcd")
print("Number of points:", len(raw_pointcloud_points.points))

# Check if the file contains intensity
if hasattr(raw_pointcloud_points, 'intensity'):
    intensities = np.asarray(raw_pointcloud_points.intensity)

PLY format

The PLY ( Polygon File Format ) format, like PCD, comes in two flavours – ASCII and Binary. It contains a header and a main body, where the header describes the format of the main body. The header of both ASCII and binary format files is ASCII text. Only the points data after the header is different in the two versions. A typical PLY header is relatively simple, and looks something like this.

ply-format-header
An example PLY file in ASCII format. The first 8 lines decribe the header. The lines after the header specify the points themselves.

Similar to PCD, PLY files can also be open using Cloud Compare. To parse PLY using Open3D, you can use the following code snippet.

import numpy as np
import open3d as o3d

# Load the PLY file
raw_ply_data = o3d.io.read_point_cloud("ply_filename.ply")

# Check if intensity values are present
# Note that the intensity property might be named scalar_intensity
if 'intensity' in raw_ply_data.point.attributes:
    intensities = np.asarray(raw_ply_data.point.attributes['intensity'])

LAS point cloud format

The LAS format (or LAZ, which is just the zipped version of the LAS format) is widely used to store aerial point clouds, such as those captured by lidars mounted on drones or airplanes. The LAS format is more complicated than the other formats, as it is meant to store huge point clouds covering large geographical areas.

While cloud compare can open LAS files as well, you might find it lagging when viewing large point clouds on typical non-GPU computers. If you need to regularly visualize large LAS files, it is recommended that you use a GPU containing computer. Another option is to break down the file into smaller files covering smaller sectors of the original geographical area. The laspy python package is excellent for handling large las files efficiently. The snippet below shows how you can extract the co-ordinates from a laspy file. If you want to learn more about the LAS format and how its structured, here is a good explanation of the LAS format.

import laspy

# Open the las file with handle so large files can be loaded up efficiently
with laspy.open('filename.laz') as file_handle:
    las = file_handle.read()
    
    # Get point format
    point_format = las.point_format
    print(list(point_format.dimension_names))
    print(las.points[0].x, las.points[0].y, las.points[0].z)

Point cloud file extensions

Point cloud bin files don’t necessarily have a specific file extension since, as mentioned earlier, its not technically a format. However, in practice most binary files are saved with the .bin extension

PCD format point cloud files, whether binary or ASCII, are saved with the .pcd extension

Similar to PCD, PLY files are saved with .ply extension regardless of whether they are in ASCII or binary format PLY files.

LAS format point cloud files are saved in either .las or .laz file formats depending on whether the file is zipped or unzipped.

Conclusion

Lidar point cloud data comes in a variety of formats, each with its own advantages for storage and access. ASCII formats, while human-readable, can be large. Binary formats, on the other hand, offer smaller file sizes but require specialized software. While viewing point clouds is not as simple as viewing images, with the right software and understanding of the formats, unlocking the wealth of 3D information they contain is possible.

Schedule free consultation