In Dielmo, we are aware of the need that many of our customers have to increase the speed in the processing of their LiDAR data, because increasingly, the data collected for analysis and presentation of results, have larger sizes and the required delivery times are increasingly smaller.
That is why, sometimes, the processing of these data is not agile and delivery times are not operational.
For example, for obtaining results in the analysis of hazardous vegetation near power lines from LiDAR data, we have very short delivery times and after the classification of the point cloud, we usually have two or three days to perform the calculations and make a quality control of the results, so the calculation time must be reduced as much as possible.
That is why, in this industry where the data to be processed has increasingly larger sizes and tighter turnaround times, applying new supercomputing techniques for LiDAR data processing is a necessity.
In response, Dielmo launched its internal R&D project:
Development of parallelized computing algorithms to increase computational speed in BIG DATA processing on its local network.
The technological challenge was, and still is, one of the most important in the area of geospatial BIG DATA processing.
With the appearance of new sensors for massive LiDAR data capture such as the Geiger-Mode, developed by Harris, or the single photon LiDAR from Leica, which allow a LiDAR data capture 10 times faster and 10 times denser, the natural cyclical evolution of our industry goes through 3 phases:
With this project, Dielmo aimed to provide a solution to the third step, thus finding a scalable improvement in the speed at which we processed our data.
It is a necessary advance worldwide, since currently, there is no API for the treatment, management and presentation of specific results of LiDAR data, with a BackEnd that controls the prioritization of the work queues and that allows the integral management of the data processing.
With this in mind, the projects’ objectives have focused on finding answers to the following questions:
This project also aims to achieve unlimited scalability thanks to the in-house development of the algorithms. This way we have no software license limitations, we could simply add more hardware in case we need more processing power.
The work plan was as follows:
In this first phase, a deep study was made to make sure we had all the knowledge and tools to implement parallelized calculations in LiDAR data processes.
We studied the algorithms that are usually used in LiDAR data processing, in order to adapt them to parallelized computation and once this review was done, we defined the architecture for parallelized computation and the architecture for the management of process queues.
We had already defined the global ecosystem of work.
In this second phase, the first Worker was developed, i.e., the development of the tasks that the Worker should be able to perform in an automated way.
As a validation parameter, we stablished a minimum of tasks that the Worker should be able to perform. To this end, an algorithm was developed that was capable of removing tasks from the work queue as those in execution were completed. At the same time, it had to be able to assign these tasks.
Once these tasks were assigned, the algorithm was defined so that it could deserialize and collect input parameters.
In addition, the Worker had to be able to collect the input parameters, execute the algorithm and monitor it.
Finally, it had to obtain the result of the previous processes, and copy them to a marked destination.
Summarizing, a complex job is made up of a series of simple jobs. And a sequential job is a set of complex jobs executed one after the other in order of priority and waiting for the previous ones to be completed before starting the next ones.
The SPMD paradigm is a common technique for achieving parallel computation of different data with the same program. It’s a technique that tries to separate the data into smaller subsets and apply the same algorithms to each one individually.
In the case of solving geographical problems, it would be the equivalent to dividing the data to be processed into tiles, for example, tiles of one square kilometer (1km2), and calculating each one on a different machine, allowing us to increase the speed significantly.
In this phase, an API for queue management was developed. This API took shape as testing was carried out for all types of jobs that can occur: simple, complex and sequential jobs.
Next, BackEnd testing and debugging were performed to stabilize the overall process and verify that it was indeed working perfectly.
In addition, in this phase, the new computing ecosystem was integrated into DIELMO’s internal production system.
The results obtained were very positive.
We confirmed an improvement in the calculation of up to 100 times higher than usual in companies in our industry.
In addition, we were able to observe significant improvements in many other aspects, which we mention below:
This gives us an important margin by improving the efficiency of LiDAR data processing, affecting industries as diverse as conventional cartography, power line maintenance, forestry applications, hydrology, etc.
And not only this, with this project, Dielmo is positioned as the only company worldwide, able to offer a LiDAR data processing services using parallel computing algorithms, and with an API tool with a visual Backend web environment, which allows the integrated management of the processing of these data.
This research project was co-funded through the Center for the Development of Industrial Technology (CDTI) and by the European Regional Development Fund (ERDF), project: IDI-20170969 between 2017 and 2018.