EarthView dataset
Overview
The EarthView Dataset is a comprehensive collection of multispectral earth imagery. The dataset is divided into four distinct subsets sourced from Satellogic, Sentinel-1, Sentinel-2 (coming soon), and NEON imagers, each providing unique data.
Dataset Viewer
Check the EarthView Dataset Viewer and it's code for examples on how to access the images and navigate the dataset.
Data Sources
Each subset (AKA configuration) in the EarthView dataset includes samples representing specific patches of the Earth. Each source (satellite type) has different characteristics, so the details for the samples in each of the subsets are subtly different. We provide a very simple library to access the images in the subsets.
Available Subsets
Name |
Samples |
Unique locations |
Products |
Image Resolution |
Satellogic |
~6 million |
~3 million |
RGB |
1m |
|
|
|
NIR (Near Infrared) |
1m |
Neon |
~1 million |
~0.3 million |
RGB |
0.1m |
|
|
|
Canopy Height Model (Lidar) |
1m |
|
|
|
Hyperspectral (369 bands) |
1m |
Sentinel-1 |
~5.2 million |
~1 million |
SAR (mapped to RGB) |
10m (from 20m) |
Sentinel-2 |
~10 million |
~1 million |
RGB |
10m |
|
|
|
NIR |
10m |
|
|
|
NIR / Red Edge / SWIR |
20m |
|
|
|
Scene Classification Layer |
20m |
|
|
|
Coastal-Aerosol / Water Vapour / Cirrus |
40 (from 60m) |
Data Format
Each subset has some peculiarities and a specific data format in the dataset. Each item (sample) in the dataset is a dictionary with a metadata
field and one or more entries for the different image products available, such as rgb
, chm
, 1m
, 10m
(see below). All of the image fields are 4D arrays where dimensions are REVISITS, BANDS, HEIGHT, WIDTH.
We encourage you to use the supplied earthview
library to simplify accessing the dataset, metadata and images. (For now, download the single python file and place in the same directory as your scripts, or in Python's path)
Bellow you'll find details for each subset
Satellogic
Metadata
key |
description |
bounds |
[x_min, y_min, x_max, y_max] (bottom-left corner and top-right corner coordinates in (easting, northing) format, WGS 84 / UTM). |
|
[178191.0, 8248444.0, 178575.0, 8248828.0] |
|
[[171507.0, 7874670.0, 171891.0, 7875054.0], [174795.0, 7878747.0, 175179.0, 7879131.0]] |
crs |
EPSG code |
|
['EPSG:32723'] |
timestamp |
list of timestamps corresponding to the capture dates (only date is valid) |
|
['2022-07-21T00:00:00'] |
|
['2022-07-21T00:00:00', '2022-07-25T00:00:00'] |
count |
Number of re-visits for the location (not present in the dataset, generated by items_to_images() ) |
|
1 |
Images
Satellogic images are captured with Satellogic's MarkIV satellite fleet. The payload produces RGB and NIR images at 1m native resolution (no PAN sharpening). Each sample in the dataset has 1 or 2 re-visits per location.
key |
Product |
Image Resolution |
Size (pixels) |
Bands |
Re-samples |
rgb |
RGB |
1m |
384 x 384 |
3 |
1 or 2 |
1m |
NIR |
1m |
384 x 384 |
1 |
1 or 2 |
Examples (Jupyter Notebook)
import numpy as np
import earthview as ev
data = ev.load_dataset("satellogic", shards=[10])
sample = next(iter(data))
print(sample.keys())
print(np.array(sample['rgb']).shape)
print(np.array(sample['1m']).shape)
|
dict_keys(['1m', 'rgb', 'metadata'])
(1, 3, 384, 384)
(1, 1, 384, 384)
|
sample = ev.item_to_images("satellogic", sample)
display(sample)
display(*sample["rgb"])
display(*sample["1m"])
|
{'1m': [<PIL.Image.Image image mode=L size=384x384>],
'rgb': [<PIL.Image.Image image mode=RGB size=384x384>],
'metadata': {'bounds': [[178191.0, 8248444.0, 178575.0, 8248828.0]],
'crs': ['EPSG:32723'],
'timestamp': ['2022-08-13T00:00:00'],
'count': 1}}
|
Sentinel-1
Metadata
key |
description |
type |
'Polygon' Indicates the coordinates are the vertices of a Polygon |
coordinates |
Five coordinates of a closed Polygon. |
|
[[[434520.0, 8715520.0], [438360.0, 8715520.0], [438360.0, 8711680.0], [434520.0, 8711680.0], [434520.0, 8715520.0]]] |
crs |
EPSG code |
|
'epsg:32736' |
count |
Number of re-visits for the location (not present in the dataset, generated by items_to_images() ) |
|
6 |
Images
Sentinel-1 carries a Synthetic Aperture Radar (SAR) payload. The data (imagery) produced has two channels, for vertical and horizontal polarization. The data in the dataset contains just the two channels, the images returned by item_to_images()
implements a standard mapping to return RGB images. (see the example below). Samples in the dataset contain varied numbers of re-visits per location.
key |
Product |
Image Resolution |
Size (pixels) |
Bands |
Re-samples |
10m |
RGB |
10m |
384 x 384 |
3 |
1 or 2 |
Examples (Jupyter Notebook)
import numpy as np
import earthview as ev
data = ev.load_parquet("dataset/sentinel_1/train-00088-of-01763.parquet")
sample = next(iter(data))
print(sample.keys())
print(np.array(sample['rgb']).shape)
print(np.array(sample['10m']).shape)
|
dict_keys(['10m', 'metadata'])
(6, 2, 384, 384)
|
sample = ev.item_to_images("sentinel_1", sample)
display(sample)
display(*sample["10m"][:2])
|
{'10m': [<PIL.Image.Image image mode=RGB size=384x384>,
<PIL.Image.Image image mode=RGB size=384x384>,
<PIL.Image.Image image mode=RGB size=384x384>,
<PIL.Image.Image image mode=RGB size=384x384>,
<PIL.Image.Image image mode=RGB size=384x384>,
<PIL.Image.Image image mode=RGB size=384x384>],
'metadata': {'type': 'Polygon',
'crs': 'epsg:32736',
'coordinates': [[[434520.0, 8715520.0],
[438360.0, 8715520.0],
[438360.0, 8711680.0],
[434520.0, 8711680.0],
[434520.0, 8715520.0]]],
'count': 6}}
|
Neon
In the dataset, the subset/configuration for NEON is called default
, but when using the earthview library you should call it neon
.
Metadata
key |
description |
bounds |
[x_min, y_min, x_max, y_max] (bottom-left corner and top-right corner coordinates in (easting, northing). |
|
[-82.04138011662944, 29.634596313943526, -82.04071312100113, 29.635179014437973] |
epsg |
EPSG code |
|
'EPSG:32617' |
timestamp |
list of timestamps corresponding to the capture dates (only date is valid) |
|
['2018-01-01T00:00:00', '2019-01-01T00:00:00', '2021-01-01T00:00:00'] |
siteID |
'OSBS' for Ordway-Swisher Biological Station |
count |
Number of re-visits for the location (not present in the dataset, generated by items_to_images() ) |
|
3 |
Images
The NEON subset is composed of very high resolution RGB images at 0.1m, 1m hyperspectral data (369 bands), and 1m Canopy Height Model out of a LIDAR sensor. Every sample in the dataset contains 3 re-visits.
key |
Product |
Image Resolution |
Size (pixels) |
Bands |
Re-samples |
rgb |
RGB |
0.1m |
640 x 640 |
3 |
3 |
chm |
Canopy Height Model |
1m |
64 x 64 |
1 |
3 |
1m |
Hyperspectral |
1m |
64 x 64 |
369 |
3 |
When using item_to_images()
the Hyperspectral images are mapped, from the 369 bands to RGB using a meaningless mapping. Please, don't use it for anything else than an example.
Examples (Jupyter Notebook)
import numpy as np
import earthview as ev
data = ev.load_dataset("neon", shards=[100])
sample = next(iter(data))
print(sample.keys())
print(np.array(sample['rgb']).shape)
print(np.array(sample['chm']).shape)
print(np.array(sample['1m']).shape)
|
dict_keys(['1m', 'chm', 'rgb', 'metadata'])
(3, 3, 640, 640)
(3, 1, 64, 64)
(3, 369, 64, 64)
|
sample = ev.item_to_images("neon", sample)
display(sample)
display(sample["rgb"][0])
display(sample["1m"][0])
display(sample["chm"][0])
|
{'1m': [<PIL.Image.Image image mode=RGB size=64x64>,
<PIL.Image.Image image mode=RGB size=64x64>,
<PIL.Image.Image image mode=RGB size=64x64>],
'chm': [<PIL.Image.Image image mode=L size=64x64>,
<PIL.Image.Image image mode=L size=64x64>,
<PIL.Image.Image image mode=L size=64x64>],
'rgb': [<PIL.Image.Image image mode=RGB size=640x640>,
<PIL.Image.Image image mode=RGB size=640x640>,
<PIL.Image.Image image mode=RGB size=640x640>],
'metadata': {'bounds': [-82.04138011662944,
29.634596313943526,
-82.04071312100113,
29.635179014437973],
'epsg': 'EPSG:32617',
'siteID': 'OSBS',
'timestamp': ['2018-01-01T00:00:00',
'2019-01-01T00:00:00',
'2021-01-01T00:00:00'],
'count': 3}}
|
Sentinel 2 (Coming Soon)
Check the EarthView Dataset Viewer and it's code to get an idea on how to use it.
Known Issues
- The time component of timestamps is not correct.
- Bounds for some images is not correct. We are evaluating what has happened.
We intend to update the dataset to fix issues when possible