Blog

LiDAR data processing supercomputing project

Dielmo supercomputing control panel

In Dielmo, we are aware of the need that many of our customers have to increase the speed in the processing of their LiDAR data, because increasingly, the data collected for analysis and presentation of results, have larger sizes and the required delivery times are increasingly smaller.

That is why, sometimes, the processing of these data is not agile and delivery times are not operational. 

 

For example, for obtaining results in the analysis of hazardous vegetation near power lines from LiDAR data, we have very short delivery times and after the classification of the point cloud, we usually have two or three days to perform the calculations and make a quality control of the results, so the calculation time must be reduced as much as possible. 

 

That is why, in this industry where the data to be processed has increasingly larger sizes and tighter turnaround times, applying new supercomputing techniques for LiDAR data processing is a necessity.

 

In response, Dielmo launched its internal R&D project:

Development of parallelized computing algorithms to increase computational speed in BIG DATA processing on its local network.

Table of content

Main objectives

Understanding the challenge

The technological challenge was, and still is, one of the most important in the area of geospatial BIG DATA processing.

With the appearance of new sensors for massive LiDAR data capture such as the Geiger-Mode, developed by Harris, or the single photon LiDAR from Leica, which allow a LiDAR data capture 10 times faster and 10 times denser, the natural cyclical evolution of our industry goes through 3 phases:

  • Develop data collection systems that allow a higher volume of capture.
  • Work on the variety of information assets collected that comes as a result of the increase in data volume.
  • Work on improving the speed of processing this data.

Looking for solutions

With this project, Dielmo aimed to provide a solution to the third step, thus finding a scalable improvement in the speed at which we processed our data

It is a necessary advance worldwide, since currently, there is no API for the treatment, management and presentation of specific results of LiDAR data, with a BackEnd that controls the prioritization of the work queues and that allows the integral management of the data processing.

With this in mind, the projects’ objectives have focused on finding answers to the following questions:

  • How to improve task queuing.
  • How to change priorities or remove tasks from the running queue.
  • How to communicate to consumers the tasks to be performed.
  • How to improve the monitoring of running tasks.
  • How to develop calculation algorithms under the SPMD (Single Program, Multiple Data) paradigm to increase the calculation speed in the processing of BIG DATA within the company’s local network.

Unlimited scalability

This project also aims to achieve unlimited scalability thanks to the in-house development of the algorithms. This way we have no software license limitations, we could simply add more hardware in case we need more processing power.

Work plan

The work plan was as follows:

1 - Definition of the global working ecosystem for parallelized computation.

In this first phase, a deep study was made to make sure we had all the knowledge and tools to implement parallelized calculations in LiDAR data processes. 

We studied the algorithms that are usually used in LiDAR data processing, in order to adapt them to parallelized computation and once this review was done, we defined the architecture for parallelized computation and the architecture for the management of process queues.

We had already defined the global ecosystem of work.

2 - Definition and development of the 1st WORKER

In this second phase, the first Worker was developed, i.e., the development of the tasks that the Worker should be able to perform in an automated way. 

As a validation parameter, we stablished a minimum of tasks that the Worker should be able to perform. To this end, an algorithm was developed that was capable of removing tasks from the work queue as those in execution were completed. At the same time, it had to be able to assign these tasks. 

Once these tasks were assigned, the algorithm was defined so that it could deserialize and collect input parameters. 

In addition, the Worker had to be able to collect the input parameters, execute the algorithm and monitor it. 

Finally, it had to obtain the result of the previous processes, and copy them to a marked destination.

3 - Definition, development and validation of the working environment for parallelized computation of simple, complex and sequential jobs.

In the third phase we developed the work blocks to be able to work under the SPMD paradigm, creating 3 types of work blocks:
 
  • Simple jobs
    A simple job is the minimum subdivision of the total data to be processed.
    For example, a job processing hazardous vegetation in an area of 1 square kilometer.
  • Composite jobs
    The composite job is a block that is generated to speed up processes in large projects. In this way, the total project is taken and divided into single jobs.
    For example, a composite job is divided into 100 single jobs that will be processed under the same algorithm on different machines in parallel.

  • Sequential jobs
    A sequential job adds another layer of complexity. In this case, it is a composite job to which different steps or tasks will be applied. For example: Calculate the hazardous vegetation, then we do something else, like vectorize, until you have finished all the hazardous vegetation blocks, you don’t start with the vectorize blocks.


Summarizing, a complex job is made up of a series of simple jobs. And a sequential job is a set of complex jobs executed one after the other in order of priority and waiting for the previous ones to be completed before starting the next ones.

The SPMD paradigm is a common technique for achieving parallel computation of different data with the same program. It’s a technique that tries to separate the data into smaller subsets and apply the same algorithms to each one individually. 

In the case of solving geographical problems, it would be the equivalent to dividing the data to be processed into tiles, for example, tiles of one square kilometer (1km2), and calculating each one on a different machine, allowing us to increase the speed significantly.

4 - Development of the API and BackEnd for management and control of parallelized calculations on the work queues.

In this phase, an API for queue management was developed. This API took shape as testing was carried out for all types of jobs that can occur: simple, complex and sequential jobs. 

Next, BackEnd testing and debugging were performed to stabilize the overall process and verify that it was indeed working perfectly. 

In addition, in this phase, the new computing ecosystem was integrated into DIELMO’s internal production system.

Backend supercomputing LiDAR

Results

The results obtained were very positive. 

We confirmed an improvement in the calculation of up to 100 times higher than usual in companies in our industry. 

In addition, we were able to observe significant improvements in many other aspects, which we mention below: 

  • Reduction of data processing time, giving a quick response to the client and increasing the processing speed up to 100 times higher.
  • Being the first company to have this competitive advantage for the processing of large data generated with LiDAR technology
  • Reduction of the final cost of data processing. 
  • Scalable solution, being able to expand the system according to the needs, increasing the number of processors.
  • Extrapolable solution for any spatial data format, not only those collected with LiDAR technology, so it is prepared for new technologies for capturing larger data. 
  • Creation of an API tool with a visual Backend environment, which allows the integral management of BIG DATA processing. 
  • Development of algorithm patterns that allow adapting to the parallel environment the current sequential calculation algorithms developed as extensions of SEXTANTE within gvSIG. 
  • Ability to queue jobs from any local network PC through a GIS environment such as gvSIG for subsequent monitoring through the Backend. 
  • Development of an easy-to-use visual environment for the management of spatial data calculation queues. 
  • Parallel computing architecture that allows the adaptation of geospatial algorithms to distributed computing, as well as to queue tasks from any computer in the local network using a Geographic Information System (GIS) in a visual way. 
  • Exponential improvement of the processing speed of BIG DATA, in particular LiDAR data, about 100 times with respect to previous speeds. 
  • Elimination of bottlenecks in the processing of this type of data. API tool for work queue management. 
  • Backend visual web management tool.

This gives us an important margin by improving the efficiency of LiDAR data processing, affecting industries as diverse as conventional cartography, power line maintenance, forestry applications, hydrology, etc.

And not only this, with this project, Dielmo is positioned as the only company worldwide, able to offer a LiDAR data processing services using parallel computing algorithms, and with an API tool with a visual Backend web environment, which allows the integrated management of the processing of these data.

This research project was co-funded through the Center for the Development of Industrial Technology (CDTI) and by the European Regional Development Fund (ERDF), project: IDI-20170969 between 2017 and 2018.

Fondo europea de desarrollo regional

Share this:

Related posts

Copyright © 2020 Dielmo 3DAviso legalPolítica de privacidadPolítica de cookies