Open MULTIDRONE Datasets

Artificial Intelligence & Information Analysis

Open MULTIDRONE Datasets

This is a webpage to distribute UAV video datasets created/assembled within the MULTIDRONE project. Overall, 36 individual datasets with a total size of approximately 260 GB have been constructed, for facilitating scientific research, in tasks such as visual detection and tracking of bicycles, row boats, human crowds, etc. A large subset of these data has been annotated. If one uses any part of these datasets in his/her work, he/she is kindly asked to cite the following two papers:

  • I. Mademlis, A. Torres-Gonzalez, J. Capitan, M. Montagnuolo, A. Messina, F. Negro, C. Le Barz, T. Goncalves, R. Cunha, B. Guerreiro, …, and I.Pitas, “A multiple-UAV architecture for autonomous media production”, Springer Multimedia Tools and Applications, pp. 1-30, 2022 (DOI: 10.1007/s11042-022-13319-8).
  • I. Mademlis, N.Nikolaidis, A.Tefas, I.Pitas, T. Wagner and A. Messina, “Autonomous UAV Cinematography: A Tutorial and a Formalized Shot-Type Taxonomy”, ACM Computing Surveys, vol. 52, issue 5, pp. 105:1-105:33, 2019 (DOI: 10.1145/3347713).

Additional papers need to also be cited for specific datasets (details below).

In order to access the datasets created/assembled by Aristotle University of Thessaloniki, please complete and sign this license agreement. Subsequently, email it to Prof. Ioannis Pitas (using “MULTIDRONE public datasets availability and download credentials” as e-mail subject) so as to receive FTP credentials for downloading.

In order to access the datasets created/assembled/provided by other MULTIDRONE partners (RAI, Deutsche Welle), please complete and sign this license agreement. Subsequently, email it to Alberto Messina so as to receive FTP credentials for downloading.

AUTH Multidrone Datasets
RAI/Deutsche Welle Multidrone Datasets

 

AUTH Multidrone Datasets

To acquire these datasets, please complete and sign this license agreement. Subsequently, email it to Prof. Ioannis Pitas so as to receive FTP credentials for downloading.

If you are granted permission to access, the following datasets are available (NOTE: For datasets assembled from Youtube videos, only links to the videos and the relevant annotation files, if any, are provided).

 

-Human Crowd Datasets

Several image/video datasets depicring human crowds from aerial views have been assembled and/or annotated in the context of MULTIDRONE. If one uses any part of these datasets in his/her work, he/she is kindly asked to additionally cite the following three papers:

  • C. Papaioannidis, I. Mademlis and I.Pitas, “Autonomous UAV safety by visual human crowd detection using multi-task Deep Neural Networks”, Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2021 (DOI: 10.1109/ICRA48506.2021.9560830).
  • M. Tzelepi and A. Tefas, “Graph-embedded Convolutional Neural Networks in human crowd detection for drone flight safety”, IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 5(2), pp. 191-204, 2019 (DOI: 10.1109/TETCI.2019.2897815).
  • M. K. Lim, V. J. Kok, C. C. Loy, C. S. Chan, “Crowd saliency detection via global similarity structure”, Proceedings of the IEEE International Conference on Pattern Recognition (ICPR), 2014 (DOI: 10.1109/ICPR.2014.678).

The individual human crowd datasets are detailed below.

-DCROWD_VID

A dataset for visual human crowd detection was assembled from Youtube videos, licensed mainly under Standard Youtube License. It is a collection of 53 videos selected by querying the Youtube search engine with specific keywords describing crowded events (e.g. parade, festival, marathon, protests). Non-crowded videos have also been gathered by searching for unspecified drone videos. No annotation is currently available.

-Aerial_Crowd_Auth

An aerial crowd detection dataset was created by annotating videos captured by two different RGB cameras (olympus, sony) placed ~10m over the ground, recording a human crowd from different viewing angles. The videos were partially annotated, resulting in 563 1920×1080 RGB images along with their segmentation maps, which consist of two classes (‘crowd’, ‘non-crowd’). The segmentation maps are available as .png images, where pixels belonging in the ‘crowd’ class are in red color, while ‘non-crowd’ class pixels are in black color.

-UAV_Crowd_Seville

A UAV crowd detection dataset was created by annotating videos captured by three different operating UAV cameras. The videos were partially annotated, resulting in 603 1920×1080 RGB images along with their segmentation maps, which consist of two classes (crowd, non-crowd). The ground-truth segmentation maps are available as .png images, where pixels belonging in the ‘crowd’ class are in red color, while ‘non-crowd’ class pixels are in black color.

-AirsimCrowd

A synthetic crowd detection dataset obtained from the UAV simulation software AirSim. It consists of 602 RGB synthetic images at a resolution of 640 x 360, obtained by simulating two different UAV flight scenarios. Each RGB image is annotated with its corresponding segmentation map, where each pixel is assigned a class label belonging to one of the two classes, crowd and non-crowd. In addition to the corresponding segmentation maps, the RGB segmentation images are provided.

-DroneCrowd

Α composite human crowd detection dataset from aerial views. It consists of 1199 train and 591 test RGB images that depict human crowds in a wide range of scenes (urban, countryside, day, night) captured at varying altitudes (from low to very high altitudes). It combines images from the MULTIDRONE datasets DCROWD_VID, Aerial_Crowd_Auth and UAV_Crowd_Seville. The image resolution varies from 480 x 360 to 1920 x 1080 pixels. Each RGB image is annotated with its corresponding segmentation map, where each pixel is assigned a class label belonging to one of the two classes, i.e., crowd and non-crowd. In addition to the corresponding segmentation maps, the RGB segmentation images are provided.
 
 
SHOT_TYPES
A dataset containing 46 professional and semi-professional UAV videos was assembled from Youtube material. Care was taken to include as many UAV framing shot types and UAV/camera motion types as possible, based on the UAV shot type taxonomy defined in the context of the MULTIDRONE project. No annotation is currently available.
Annotations_boats_Raw
A dataset for boat detection/tracking was assembled, consisting of 13 Youtube videos (resolution: 1280 x 720) at 25 frames per second. Annotations are not exhaustive, i.e. there may be unannotated objects in the given image frames. An annotation file is included along with each video file. The annotations are stored in the text files with the format:
    • frameN
    • #objects
    • x y w d

where x, y indicate the upper left corner of the bounding box and w, h describe its width and height in frame N.

Annotations_Bicycles_Raw
A dataset for bicycle detection/tracking was assembled, consisting of 7 Youtube videos (resolution: 1920 x 1080) at 25 frames per second. Annotations are not exhaustive, i.e., there may be unannotated objects in the given video frames. An annotation file is included along with each video file. The annotations are stored in the corresponding text files with the following format:
    Channel   frameN   ObjectID   x1   y1   x2   y2   0   ObjectType/View

where x1, y2, x2, y2 refer to the upper left and bottom right corner of the bounding box, Object ID is a numerical object identifier (non-consistent, non-reliable), frameN is the number of video frame, while ObjectType/View (where applicable) labels the object class and categorical pose relative to the camera (“1F” means Front View, “1B” means Back View, “1L” means Left View, “1R” means Right View, 2 means Bicycle Crowd, 5H means High-Density Human Crowd, 5L means Low-Density Human Crowd, 0 denotes irrelevant TV graphics).

If one uses any part of this dataset in his/her work, he/she is kindly asked to additionally cite the following paper:

  • P. Nousi, I. Mademlis, I. Karakostas, A. Tefas and I.Pitas, “Embedded UAV Real-time Visual Object Detection and Tracking”, Proceedings of the IEEE International Conference on Real-time Computing and Robotics (RCAR), 2019 (DOI: 10.1109/RCAR47638.2019.9043931).

Benchmark_RAI

A dataset for bicycle detection/tracking was prepared by processing/editing and annotating material made available by RAI under the “Giro 2017” MULTIDRONE dataset. It is a dataset consisting of two videos (resolutions: 768 x 432 and 960 x 540) at 25 frames per second. The videos are from Giro d’Italia TV coverage provided by RAI. Annotations are exhaustive, i.e., all objects of a certain class present in a given image are covered by an annotation. An annotation file is included along with each video file. The annotations are stored in the text files with the following format:
    • frameN
    • #objects
    • x y w d

where x, y indicate the upper left corner of the bounding box and w, h describe its width and height in frame N.

If one uses any part of this dataset in his/her work, he/she is kindly asked to additionally cite the following paper:

  • P. Nousi, I. Mademlis, I. Karakostas, A. Tefas and I.Pitas, “Embedded UAV Real-time Visual Object Detection and Tracking”, Proceedings of the IEEE International Conference on Real-time Computing and Robotics (RCAR), 2019 (DOI: 10.1109/RCAR47638.2019.9043931).
AUTH-Persons
AUTH-Persons is a visual person detection dataset, containing 53 real and synthetic videos, summing to a footage of approximately 37 minutes. Overall, 4 of those videos (≈6min) were collected from a DJI Phantom 4 UAV, while performing flights in the campus of the Aristotle University of Thessaloniki. The rest of the videos were collected in virtual environments, using the AirSim simulator and a set of maps designed in Unreal Engine 4. All maps were populated with a large number of humans and obstacles (e.g., trees, structures, etc.), in order to achieve a high level of occlusions. The images were collected from a virtual UAV, while orbiting around in various altitudes.
A separate annotation file (.txt) is provided for each video frame. Each line refers to a bounding box in the corresponding video frame in the following format:

x_center y_center width height
The specified pixel coordinates are relative to the width/height of the corresponding video frame.

If one uses any part of this dataset in his/her work, he/she is kindly asked to additionally cite the following paper:

      • C. Symeonidis, I. Mademlis, I. Pitas, N. Nikolaidis, “AUTH-Persons: A dataset for detecting humans in crowds from aerial views”, Proceedings of the IEEE International Conference on Image Processing (ICIP), 2022 (DOI: 10.1109/ICIP46576.2022.9897612).
AUTHDroneSunday_VID
A dataset for visual human crowd detection was collected, in the form of 6 videos shot inside the AUTH Campus using a DJI Phantom IV UAV. The videos depict a crowd of visitors during an “AUTH at Sundays” event. The video format is UHD 20160p, with a resolution of 4096 x 2160 at a rate of 25 frames per second. There are two scenes, the first containing a sparse crowd that moves near exhibition stands and the second a dense static crowd that watches a presentation done by AUTH students. The second scene has 5 videos that are shot from different view angles. No annotation is currently available.

uav_detection
A dataset was prepared by AUTH for visual drone detection. It consists of 12 Full HD videos (1080p – 1920 x 1080) filmed using two cameras. The cameras were pointed at the general direction of a flying DJI Phantom IV. The drone is shot against various backgrounds, including the sky, trees, buildings and roads. In 11 out of the 12 videos, the two cameras are at ground level and looking up to the drone, maintaining a bottom view of it. In the last video the camera is at the same or higher elevation than the drone, maintaining mostly side and top views of it. The total video duration is 31 minutes. About 39K video frames were annotated for drone detection, with annotations of the following format:

    frame_number, number_of_drones, x_min, y_min, width, height

uav_detection_2
A dataset for drone detection was collected using one camera held by a person on the ground, within AUTH campus. In total, 11 Full HD videos were produced, which contain shots of a DJI Phantom IV, shot against various backgrounds and at multiple sizes and views. The total duration of this dataset is 15 minutes, or about 22K frames at 25fps. No annotation is currently available.

landing_sites
A dataset of videos depicting potential UAV landing sites has also been captured. It consists of 2 videos (at a resolution of 4096 x 2160 pixels and with approximate total duration 5 minutes) captured by a DJI Phantom IV within AUTH campus, containing potential landing sites around a point of interest (POI), or generally in the university campus. The potential landing sites include terrain locations characterized by small terrain slope and no obstacles, so as to maximize the possibility of safe UAV landing. No annotation is currently available.

AUTHObservatory_VID
A dataset named “AUTHObservatory_VID” was also collected by AUTH for building/Point-of-Interest detection purposes. It consists in two videos shot inside the AUTH Campus using a DJI Phantom IV UAV, containing the building of the observatory with the telescope dome. This is a unique building in the campus that can be considered as a Point-Of-Interest in the context of the other buildings. The video format is UHD 2160p, with a resolution of 4096 x 2160 at a rate of 25 frames per second. The view angles include a top view and a 360 perspective of the building sides from a height of 30m-50m. No annotation is currently available.

face_deid_UAV
A dataset for face de-identification consists of one 3840×2160 video, which was shot by flying a DJI Phantom IV. The drone was flying at a height of about 3-5 meters and its camera was pointed downwards recording the subjects walking-by and occasionally looking directly at it. The total video duration is 45 seconds with a framerate of 25 fps. Each face in the 1124 extracted frames is annotated with a bounding box, using the pixel coordinates of its top left corner followed by its width and height, also in pixels. So the annotation of the dataset is in the following format:

    frame_number, number_of_faces, bounding box for each face in this frame

face_deidentification_UAV_mult_views
A dataset for face de-identification purposes was collected by a DJI Phantom IV UAV and consists of one 4096 x 2160 video. The UAV was flying at a height of about 3-5 meters, while the subjects were recorded from multiple viewpoints walking-by and occasionally looking directly at it. The total duration of the video is 2 minutes and 23 seconds with a framerate of 25 fps. No annotation is currently available.

Annotations_eights_DW_raw
A dataset for boat detection/tracking was created, using footage from DW, consisting of 3 videos (resolution: 1280 x 720) subsamplbed at 25 frames per second. An annotation file is included along with each video file. The annotations are stored in the text files with the format:

    • frameN
    • #objects
    • x y w d

where x, y indicate the upper left corner of the bounding box and w, h describe its width and height in frame N.

If one uses any part of this dataset in his/her work, he/she is kindly asked to additionally cite the following paper:

  • F. Patrona, P. Nousi, I. Mademlis, A.Tefas and I.Pitas, “Visual Object Detection For Autonomous UAV Cinematography”, Proceedings of the Northern Lights Deep Learning Workshop (NLDL), 2020 (DOI: 10.7557/18.5099).
UAV_Parkour
A UAV dataset for parkour athlete detection was assembled from 8 Youtube videos, depicting both male and female athletes performing pakour at different landscapes, under differ lighting conditions. The annotations provided are stored in the format:

    • frameN
    • #objects
    • x y w d

where x, y denote the upper left corner of the bounding box and w, h its width and height. As spectators are also depicted in the dataset videos, the annotation is 2-class, with label 0 assigned to ‘person’ class and 1 to ‘athlete’ class, but it is not exhaustive, i.e., there may be unannotated objects in some frames. The labels are provided in files with the following format:

    • frameN
    • #objects
    • 0 or 1

similar to the annotation files.

Final_bicycles
A dataset for bicycle detection/tracking was created, consisting of 6 HD videos, at 50 or 25 frames per second. An annotation file is included along with each video file. The annotations are stored in the text files with the format:

    • frameN
    • #objects
    • x y w d

where x, y indicate the upper left corner of the bounding box and w, h describe its width and height in frame N.

If one uses any part of this dataset in his/her work, he/she is kindly asked to additionally cite the following paper:

  • P. Nousi, I. Mademlis, I. Karakostas, A. Tefas and I.Pitas, “Embedded UAV Real-time Visual Object Detection and Tracking”, Proceedings of the IEEE International Conference on Real-time Computing and Robotics (RCAR), 2019 (DOI: 10.1109/RCAR47638.2019.9043931).
Final_boats
A dataset for rowing boat detection/tracking was created, consisting of 5 HD videos, at 50 or 25 frames per second. An annotation file is included along with each video file. The annotations are stored in the text files with the format:

    • frameN
    • #objects
    • x y w d

where x, y indicate the upper left corner of the bounding box and w, h describe its width and height in frame N.

If one uses any part of this dataset in his/her work, he/she is kindly asked to additionally cite the following paper:

  • F. Patrona, P. Nousi, I. Mademlis, A.Tefas and I.Pitas, “Visual Object Detection For Autonomous UAV Cinematography”, Proceedings of the Northern Lights Deep Learning Workshop (NLDL), 2020 (DOI: 10.7557/18.5099).
Final_single_boats
A dataset for single boat detection/tracking was created, consisting of 5 HD videos, at 50 or 25 frames per second. An annotation file is included along with each video file. The annotations are stored in the text files with the format:

    • frameN
    • #objects
    • x y w d

where x, y indicate the upper left corner of the bounding box and w, h describe its width and height in frame N.

If one uses any part of this dataset in his/her work, he/she is kindly asked to additionally cite the following paper:

  • F. Patrona, P. Nousi, I. Mademlis, A.Tefas and I.Pitas, “Visual Object Detection For Autonomous UAV Cinematography”, Proceedings of the Northern Lights Deep Learning Workshop (NLDL), 2020 (DOI: 10.7557/18.5099).
UAV_BothKamp
A UAV dataset for parkour athlete detection was created by annotating the footage acquired during MULTIDRONE experimental media production. It consists of 6 videos (1920 – 1080) at 50 frames per second. The annotations provided are not exhaustive, i.e., there may be unannotated objects in some frames, and they are stored in text files with the following format:

    • frameN
    • #objects
    • x y w d

where x, y denote the upper left corner of the bounding box and w, h its width and height.

Multiview_Boats_Bothcamp
This dataset depicts a sample of a rowing race with three row boats in Bothkamp, Germany (September 2019). The footage was captured at 50 FPS and at a resolution of 1920×1080, using two UAVs filming simultaneously from different positions and view angles. The footage from the two UAVs is contained in two separate RGB video files, losslessly compressed with the Lagarith codec. No annotation is currently available.
Multiview_Synthetic_UAV
This dataset contains sequences generated by simulating 3 camera-equiped UAVs flying simultaneously under specific UAV/camera motion types (CMTs) and framing shot types (FSTs), while filming a simulated bicycle race. Each sequence may include up to 10 cyclists, differing only in the color of their jerseys. The UAVs fly in a 3-UAV ORBIT setup, a 2-UAV CHASE plus 1-UAV VTS and a 3-UAV TRACK setups, according to the MULTIDRON UAV shot type taxonomy. The Unreal Engine 4-based AirSim simulation environment was employed for constructing the sequences. The evaluation dataset contains more than 90000 video frames, at a resolution of 640 x 360 pixels and a framerate of 25 FPS, while each video is more than 6.5 minutes long. Temporally synchronized ground-truth is provided for all camera parameters, 3D target positions and the corresponding 2D bounding boxes across all sequences.

RAI/Deutsche Welle Multidrone Datasets

To acquire these datasets, please complete and sign this license agreement. Subsequently, email it to Alberto Messina so as to receive FTP credentials for downloading.

If you are granted permission to access, the following datasets are available (non-annotated).

IGA_2017
The footage was taken during the International Horticultural Exhibition (IGA) in Berlin, June 2017. The drone used was a Mavic Pro, parts of the footage has been published on Deutsche Welle’s Internet format ‘Daily Drone’ https://www.youtube.com/watch?v=MBgjr3ua554.

WUENSDORF_2017
The footage showing a former Soviet base in the Federal State of Brandenburg, Germany, was shot with one Inspire 2 and one Mavic Pro (July 2017). The footage was used to create another Daily Drone clip https://www.youtube.com/watch?v=IIwQmGsXTNs. It shows the remains of the Soviet barracks and a Lenin statue.

MUENCHEBERG_2017
The footage was taken in Muencheberg, Brandenburg, Germany, in October 2017, using one Inspire 2 and one Mavic Pro. In total, 29 clips were produced focusing on the MULTIDRONE Camera Motion Types taxonomy. The dataset includes the clips and the associated flight records.

MUENCHEBERG_2018
One Inspire 2, one Mavic Pro and one Phantom 4 were used by a Deutsche Welle team to film a group of cyclists simulating a bicycle race, in Muencheberg, Brandenburg, Germany, during May 2018. The shoot was accompanied by colleagues from the University of Bristol who created simulations of such a bike race prior to the actual shooting. The parameters of these simulations such as flight altitude, camera angle, etc., were used during the recording of the race. Flight records are provided.

NAUEN_2018
One football player and one cyclist were filmed with one Inspire 2 and one Mavic Pro in Nauen, Germany, during April 2018. The shooting focussed on a subset of the UAV Camera Motion Types identified in the MULTIDRONE UAV shot type taxonomy (Lateral Tracking Shot, Vertical Tracking Shot, Pedestal/Elevator Shot With Target, Chase/Follow Shot, Orbit). The dataset contains 19 clips and their associated flight records.

GIRO_2017
This dataset consists of 9 clips taken form 2017 edition of the Giro d’Italia at 1920×1080 resolution and MP4 format at 25 frames per second.

GIRO_2018
This dataset consists of 26 clips taken form 2018 edition of the Giro d’Italia at 1920×1080 resolution and MP4 format at 25 frames per second.

ARCHIVE_2018
This dataset consists in 36 clips taken from RAI archives and depicting various shots of bikers, football players, boat racers and other additional outdoor sports (ski, sailing). Resolution is varying from 720×576 to 19020×1080 depending on the stored copy in the archive.

METEORA_2018
This dataset contains UAV footage filmed for Deutsche Welle’s “Euromaxx – Lifestyle in Europe”, in the mountains of Meteora, Greece, in August 2018. The footage mainly depicts rock climbing and it was shot using two drone models (a Mavic Air and an Inspire 2), as well as a variety of different shot perspectives, movements and angles.

WANNSEE_2018
This dataset contains UAV footage filmed by a Deutsche Welle team during the live rowing regatta “Rund um Wannsee” of 2018, one of the longest races in the world, set in the southwest of Berlin. Two drone teams were set along the track, a third drone was used for aerial overview and two additional standard camera teams covered the rest.

WANNSEE_2018_Test
This dataset contains UAV footage filmed by a Deutsche Welle team before the live rowing regatta “Rund um Wannsee” of 2018. Three drones were employed (an Inspire, a Mavic Air and a Mavic Pro), with flight records provided.

CYCLISTS_2019
This is a dataset depicting a bicycle race training session in northern Italy (May 2019). The footage was filmed by RAI, using a DJI Phantom UAV flying above the bikers.

YOUTUBE_Drone_Footage
This is a dataset consisting in the list of links of roughly 10 hours of drone footage on YouTube on soccer, rowing and cycling.