Wednesday, September 7, 2016

The formal closing and overview of this summer


Project: Integrating Sentinel-2 data into Marble

Short recap of the project

ESA Sentinel is a series of next-generation Earth observation missions, which provide high-quality satellite image data of Earth. Marble Virtual Globe is an open-source globe that allows users to explore  a 3D model of Earth, Mars, Venus, and the Moon, with a wide-variety of maps ranging from political to topographic. One of Marble’s most important features is it’s flexibility, it is designed to be used and integrated into a multitude of different software, and has been used extensively in third-party applications in the past. By improving Marbles maps and functionality developers get access to a free map viewer that they may use without restriction in their applications.
This project's goal was to find a way to adapt Sentinel-2 data, which is currently available for all users at the Sentinels Scientific Data Hub, into Marble, for easy viewing and access for all users.

Goals and achievements for this project

The goals for this project can be summarized in three main points:
  1. Finding a process that allows us to use the Sentinel-2 data in Marble
  2. Improving this process
  3. Using this process to gather and adapt as much data as possible
  4. Adapting different map sets

Adapting different map sets


The TopOSM map theme in Marble.

For my first task, to make my acquaintance with Marble and map themes, I was tasked with creating a map theme of TopOSM. This task introduced the concept of SlippyMaps, which we would also use in the creation of the Sentinel-2 map theme. It also involved uploading to the kde servers, to which we would later upload the Sentinel data that was ready for use in Marble. 

Above, area around San Francisco in Marble’s Satellite map theme. This was the best option users had for satellite imagery in Marble before this project. Below, the same area, using Sentinel-2 data.

Above, San Francisco area at a higher zoom level, showing the difference in populated areas. Marble’s original satellite data from the Blue Marble Next Generation image set is not high enough quality to adequately portray these areas. Below, the same area adapted from Sentinel-2 data.


Above, San Francisco area at the highest zoom level for the map theme.

The second task was finding software which was capable of taking the image data available, stored in jp2 files, which cover one subtile of a data set in different spectral bands, and converting it into an GeoTiff file with realistic colors, suitable for the creation of the slippy map tiles, that can be used in Marble.
After some research into libraries, such as GDAL (Geospatial Data Abstraction Library), the free and open-source geographic information system software, Qgis, was found to have all the features needed for this step. At this point a rough process was already in place for the creation of the tiff files, however, it required intensive user-interaction, and as a result, was much to slow to be used.
At this point, I was tasked with finding a way to improve or automate some of these steps.  This led me to the Qgis Developer Cookbook, which was a great help in understanding the inner workings of this software in order to automate it. Qgis has support for batch scripting, which seemed like a suitable solution for the issue of minimizing the user-interaction needed.
This part of the project led me to learn a lot about how software is structured, as I had to delve into the inner workings of Qgis to find the specific rendering settings we needed. After a lot of testing scripts, and reading the documentation, the script that saved the tiff files was complete, and as such one could do multiple datasets at the same time, without having to supervise and interact with the program every minute.
The next step in the data processing was using the newly created tiff files, which have realistic colors, and converting them into slippy map tiles. After researching possible solutions with GDAL, a plugin for Qgis,  Qtiles, was found to be suitable for this step. This plugin could take the data we gathered, and divide it up, into slippy map tiles, such as the ones used in OpenStreetMap. We can then host these files on the KDE servers, so that Marble can access them as needed. The only issue we currently face with this step is that it is very time-consuming. While rendering few datasets can be achieved in a day, rendering the hundreds we have acquired in one would take at least two weeks of real-time rendering, even on a computer dedicated to the task (This is due to the fact that slippy map tiles use the concept of QuadTiles, that is, every zoom level will have four-times as many tiles as the last. Since Marble needs zoom levels up to level 14, the amount of files, although small in actual file size, increase exponentially).
A solution to this problem was the idea that we would convert the tiff image data into slippy map tiles in batches, one “batch” being enough datasets to cover a level 6 tile in (example in OpenStreetMap). These slippy tiles could later be combined, with the help of a script that removes the white edges that surround them. The amount of data that needs to be processed can also be cut down by adding the bathymetry from a different map set. This way we can preserve the high-quality of the Sentinel-2 land data, while not having to convert any of the ocean tiles.
The last step is simply uploading it to the servers, which will make the tiles available in Marble.
At this point we have gathered over 130 datasets, and have created the tiff images that can be processed into slippy map tiles for more than 100 of them. The current process has been documented so that future efforts may build upon this work.

Future projects and how Marble benefits

The future of this project is to make it into a community driven effort. One of the main considerations was to find a way of creating the slippy tiles that could be easily set-up and done by many users. The amount of data is immense, however as a community project based on contributing to an existing pool of data, converting all of the Sentinel-2 data becomes achievable.
The end result of this project is the creation of a foundation, upon which future efforts can be made to cover all of Earth’s lands with satellite data. Future efforts such as Google Code-in can also improve upon this foundation, and help it move toward a community-centered project.
As the amount of data grows, Marble users will be the first to have access to such high-quality imagery, which means any open-source developer can use it in his applications. At this point preparations have begun for the making it into a community-based project, such as as online viewer to see how much of the Earth is currently covered by the satellite data. This is an easy way for contributing users to check which parts of the Earth still need more tiles. On the side of long-processing times, a server-side processing solution is also possible, so contributors will only need to upload the created images, the creation of slippy tiles can be handled by the servers.
In conclusion the project has paved the groundwork for future efforts on Sentinel-2 data integration, which will lead to Marble Virtual Globe being the first in it’s kind to possess this quality data, it being open for users all around the world to create and develop with.

Wednesday, August 3, 2016

My experiences with SOCIS 2016



Hello dear readers!

This post is a small synopsis of my experiences so far as a student in this years Summer of Code in Space, where I shall recount the whole adventure of integrating Sentinel-2 data into Marble Virtual Globe.

Sentinel-2 data.

So what exactly is this data, and why is it important to us? 

Well, Copernicus is the worlds largest single earth observation programme directed by the European Commission in partnership with the European Space Agency (ESA). ESA is currently developing seven missions under the Sentinel programme. Among these is Sentinel-2, which provides high-resolution optical images, which  is on one hand of interest to the users of Marble and the scientific community as a whole.

Our goal with this years SOCIS was to adapt this data into Marble. Since Marble has quite the track record of being used in third party applications with great success, this would essentially be a gateway for many developers to get easy access to high quality images through the Marble library.
So first order of business? Adapt the world. The summer has begun to get exciting. 

First Acquaintance with Marble   

Of course, nothing can happen so quickly, and the first task was obviously on a smaller scale. In order to familiarize me with the inner workings of Marble, my mentor gave me a task to adapt an already available dataset into a map theme for Marble. This is how I came to know TopOSM.


The TopOSM maptheme in Marble.

This task came with its own fair share of challenges, from getting Marble to display the legend icons correctly, to creating a suitable level 0 tile, but in the end it did give an insight into exactly how the creation of a map theme, from the first steps to uploading goes. At this point, the challenge was underway, and so began the real part of our ambitious project to tackle the whole world through Sentinel-2's lens and integrate it into Marble.

Sentinel-2 - From drawing board to tilerendering

After many discussions with my mentor, regarding ideas on how to make the data suitable for use in Marble, we finally came up with a plan. That plan would let us use the currently available Data Hub as a source for our images (since we don’t have a simple server we could just get the data from, as in the case of TopOSM). At that point, we just have to edit these images into a suitable format, and everything will be fine. A three step process:
Step 1. Download some data.
Step 2. Edit it.
Step 3. Use it in Marble.
As you may have guessed, step 2. was to be troublesome. Around this time my mentor came up with the first iteration of the guide for this “three-step” process. We also found an application that would suit our needs for the editing, this was to be QGIS, but we would also be using GDAL.
The now mostly finalized guide can be found here, however here are the original steps:
Step 1. Find some suitable (has few clouds, isn’t very dark, etc.) data on the Data Hub, and download it.

The Data Hub.

This step was to be the least troublesome, since what do you need? A good internet connection (check?), and hard drive space, since each dataset we download is about 4-7 gigabytes (also check). The only problem was the downloads seemed to fail, without warning or rhyme, or reason. One could move the mouse constantly, and it might fail, one could leave the computer unattended and the same thing would happen, even though the last 5 datasets were downloaded without fail.
It was quite a mystery, but thankfully the browser could restart the download (after refreshing the page and logging in again to make sure). Another helpful site was an archive, where some of the more recent datasets were uploaded. These could be easily downloaded with wget without any issues, so the troublesome downloading (the Data Hub only allows two concurrent downloads at a time) was more or less solved.

Step 2.  Edit the data.


Here's how a few tilesets look when loaded, after you applied the styles.

So you finally downloaded your first dataset, the first thing you’re going to have to do, is run a small script that generates vrt files from the images. These will then be loaded up in QGIS, where you’ll need to apply a style to them (that’s right click on each layer, load style, and navigate to the style file. At first, adding these numbers stored in the style file was done by hand). Thankfully you can only do that to the first one, and then copy it to the rest of them. Even with hotkeys, that’s about 10 – 16 styles to apply. But now, you can save your images in tif format! Just right click, save as, apply the correct settings, and…wait 1 to 4 minutes while it generates. Now do that all again for the other 15 layers. And all the other datasets.
As any reader may have felt profound despair at this point, so did I. My mentor most likely as well, as we both, very rightly, felt that there had to be a better, faster, more efficient way.
Who knew that the whole solution for all this would be a script?
So from then out, I was knee deep in the documentation to find exactly which classes store file-saving settings in QGis (hint: it’s this one). Applying the style was something I’ve seen used in plugins, so that was a good stepping stone.
A second great discovery was the fact that QGIS provides an easy way to generate the query window (so I didn’t have to meddle with the appearance, just finding the relevant settings in the documentation) through the processing toolbox.
So much easier.
Soon, the first version of the script was ready: Load the vrts in QGIS, open the script window, select where you want to save it, and which styles you want to use, and presto. An hour later you might be done (with that dataset. Onto the next!). My mentor was quite happy with the fact that you didn’t have to sit there and apply settings every 1 to 4 minutes, instead just once every half an hour or so (to load the next batch up). The sky, or in this case processing power of your computer, was the limit.

The last step in editing involves the creation of the actual slippy map tiles. Thankfully, that was already available in QGIS plugin (QTiles), so we didn’t have to find another way to make that. Tile creation however is a very slow process (it takes more than a day to process about 10 datasets), so this step is still a bit problematic. Even splitting the project into smaller sections doesn’t do much speedwise, but it’s fairly reasonable: there are 15 levels we need to create, with level 0 being a single image of the entire Earth, level 1 being four images, level 2 being those four split into 4 again, and you can soon see that at level 14, there are many tiles being generated. Such as it is.

Step 3. Upload and use the tiles in Marble.
This step is fairly obvious, you need to upload your freshly generated tiles onto the Marble servers, and soon you will be able the see the fruits of your labour with your own eyes. As of this post, more than 70 tilesets have been generated anduploaded, but there’s still a long way to go.

Concluding

For now I’m just glad to say that I’ve had a wonderful experience this summer here at Marble, by having helpful mentors around who welcomed me into the community, heard all my issues and tried to support me whenever I got stuck. Overall I’m really happy that I took part, because I learned a lot about communication, project management, problem solving, both on my own, and with help. Of course, this is just the beginning of everything, and I hope to become much more productive and helpful in the future. As for everyone, I wish you all a great summer and many great experiences to you all.