Skip to main content

5. Datasets

5.1. Downloading Datasets

You can use the "download_dataset" method in the Python SDK to download the datasets that were created in the Dataset Manager App. The following information is needed:

download_dataset(version_id, export_type, custom_download_path, is_media_include)

Parameters

ParameterData typeDefaultDescription
version_idstring-Id of the dataset version
export_typestring-Dataset export format: 'RAW', 'YOLO Darknet' or 'Semantic Segmentation'
custom_download_path (Optional)stringemptyIf this is given then, the images are downloaded to this location, otherwise it’s downloaded to a directory within the current directory. Note that this requires the absolute path.
is_media_include (Optional)booleanTrueIf the value of this field is set to True, the system will download both the annotation data and the associated media files. If the value is set to False, only the annotation data will be downloaded, and the media files will be skipped.
1. Id of the dataset version: version_id
2. Dataset export format: export_type

You can easily copy these values from the LayerNext Dataset Manager tool by accessing the ‘Download dataset using the Python SDK’ as shown below:

Product UI showing steps to download the dataset

Example Usage

client.download_dataset("635eafbec1a605ab795d2768", "YOLO Darknet")

5.2. Create a Dataset from a Collection

This SDK function creates a dataset from all or a subset of frames in a given collection in the DataLake.

create_dataset_from_collection(dataset_name: str, collection_id: str, split_info:dict, labels:list, export_types:list,  query: str, filter:dict, operation_list: list, augmentation_list: list)

Parameters

ParameterData typeDefaultDescription
dataset_namestring-Dataset name (should be non-empty)
collection_idstring-Collection ID from which data is input to create the dataset
split_infodictionary-Percentage of frames assigned for each of the portions of dataset: train, test and validation. Should be given as dictionary with train/test/validation as key and percentage as value.
labelslist-The labels belong to the dataset. At least one label should be given.
export_types (Optional)list[]List of formats to which the exports should be generated. Available export types are "RAW", "YOLO Darknet", "Semantic Segmentation"
query (Optional)stringemplThe search query that filters the items in the collection (This is the same query format that we use in the Data Lake frontend )
filter (Optional)object-Additional criteria can be specified in the filter object as similar to annotation project creation
operation_list (Optional)listNoneThe Ids of annotation projects or model runs that are included in the dataset. If this is None, then all available projects or model runs associated with selected images are taken. If this is empty, then none of the annotations are included.
augmentation_list (Optional)objectNoneDictionary containing the configurations for different types of augmentations enabled for this dataset. The details on augmentation configuration is given in the section 5.6

Example Usage

#This will create a dataset called "Balloons Dataset" from all frames in a collection and export to RAW and Semantic Segmentation
client.create_dataset_from_collection("Balloons Dataset", "<datalake_collection_id>", {"train":100, "test":0, "validation":0}, ["Balloon"], ["RAW","Semantic Segmentation"])

5.3. Create a Dataset Without Giving a Collection

This SDK function creates a dataset from a subset of frames in the DataLake without specifying a collection.

create_dataset_from_datalake(dataset_name: str, split_info:dict, labels:list, export_types:list, item_type:str,  query: str, filter:dict)

Parameters

ParameterData typeDefaultDescription
dataset_namestring-Dataset name (should be non-empty)
split_infodictionary-Percentage of frames assigned for each of the portions of dataset: train, test and validation. Should be given as dictionary with train/test/validation as key and percentage as value.
labelslist-The labels belong to the dataset. At least one label should be given.
export_types (Optional)list[]List of formats to which the exports should be generated. Available export types are "RAW", "YOLO Darknet", "Semantic Segmentation"
item_typestringimageValid values are “image” or "image_collection" or "dataset"
query (Optional)stringemplThe search query that filters the items in the collection (This is the same query format that we use in the Data Lake frontend )
filter (Optional)object-Additional criteria can be specified in the filter object as similar to annotation project creation
operation_list (Optional)listNoneThe Ids of annotation projects or model runs that are included in the dataset. If this is None, then all available projects or model runs associated with selected images are taken. If this is empty, then none of the annotations are included.
augmentation_list (Optional)objectNoneDictionary containing the configurations for different types of augmentations enabled for this dataset. The details on augmentation configuration is given in the section 5.6

Example Usage

#This will create a dataset called "Balloons Dataset - V2" from all images with Balloon annotations and export to Semantic Segmentation
client.create_dataset_from_datalake("Balloons Dataset - V2", {"train":100, "test":0, "validation":0}, ["Balloon"], ["Semantic Segmentation", "YOLO Darknet"], "image", "annotation.label=Balloon")

5.4. Update a Dataset Version with Images from a Collection

This SDK function updates an existing dataset with new data from a given collection.

update_dataset_version_from_collection(dataset_version_id:str, collection_id: str, split_info:dict, labels:list, export_types:list,  query: str, filter:dict, is_new_version_required:bool, operation_list: list, augmentation_list: list)

Parameters

ParameterData typeDefaultDescription
dataset_idstring-Dataset ID
version_idstring-ID of the current dataset version (source version)
collection_idstring-Collection ID from which data is input to update the dataset
split_infodictionary-Percentage of frames assigned for each of the portions of dataset: train, test and validation. Should be given as dictionary with train/test/validation as key and percentage as value.
labelslist-The labels belong to the dataset. At least one label should be given.
export_types (Optional)list[]List of formats to which the exports should be generated. Available export types are "RAW", "YOLO Darknet", "Semantic Segmentation"
query (Optional)string{}The search query that filters the items in the collection (This is the same query format that we use in the Data Lake frontend )
filter (Optional)object-Additional criteria can be specified in the filter object as similar to annotation project creation
is_new_version_required (Optional)boolean-True if new version of dataset to be created
operation_list (Optional)listNoneThe Ids of annotation projects or model runs that are included in the dataset. If this is None, then all available projects or model runs associated with selected images are taken. If this is empty, then none of the annotations are included.
augmentation_list (Optional)objectNoneDictionary containing the configurations for different types of augmentations enabled for this dataset. The details on augmentation configuration is given in the section 5.6

Example Usage

#This will update an existing a dataset from all frames in a collection and create a new version
client.create_dataset_from_collection("<dataset_id>", "<current_dataset_version_id>", "<datalake_collection_id>", {"train":100, "test":0, "validation":0}, ["Balloon"], ["RAW","Semantic Segmentation"], "", {}, True)

5.5. Update a Dataset Version from frames Without Giving a Collection

This SDK function updates a dataset from a subset of frames in the DataLake without specifying a collection.

update_dataset_version_from_datalake(dataset_version_id: str, split_info:dict, labels:list, export_types:list, item_type:str,  query: str, filter:dict, is_new_version_required: bool operation_list: list, augmentation_list: list)

Parameters

ParameterData typeDefaultDescription
dataset_version_idstring-ID of the current dataset version (source version)
split_infodictionary-Percentage of frames assigned for each of the portions of dataset: train, test and validation. Should be given as dictionary with train/test/validation as key and percentage as value.
labelslist-The labels belong to the dataset. At least one label should be given.
export_types (Optional)list[]List of formats to which the exports should be generated. Available export types are "RAW", "YOLO Darknet", "Semantic Segmentation"
item_typestringimageValid values are “image” or "image_collection" or "dataset"
query (Optional)stringemplThe search query that filters the items in the collection (This is the same query format that we use in the Data Lake frontend )
filter (Optional)object-Additional criteria can be specified in the filter object as similar to annotation project creation
is_new_version_required (Optional)boolean-True if new version of dataset to be created
operation_list (Optional)listNoneThe Ids of annotation projects or model runs that are included in the dataset. If this is None, then all available projects or model runs associated with selected images are taken. If this is empty, then none of the annotations are included.
augmentation_list (Optional)objectNoneDictionary containing the configurations for different types of augmentations enabled for this dataset. The details on augmentation configuration is given in the section 5.6

Example Usage

client.update_dataset_version_from_datalake("<dataset_version_id>", {"train":60, "test":30, "validation":10}, ["Balloon"], ["RAW"], "image", "annotation.label=Balloon")

5.6. Configuring Augmentations

When creating or updating a dataset, the configuration of augmentation options should be provided as a dictionary that represents a JSON structure. The JSON format should follow the structure shown below.

{
"<AUGMENTATION_CATEGORY>": [
{
"id": "<AUGMENTATION_TYPE_ID>",
"properties": [
{
"id": "<PROPERTY_ID1>",
"values": [
<VALUE_1>,
<VALUE_2>
]
},
{
"id": "<PROPERTY_ID2>",
"values": [
<VALUE_1>
]
}
]
},
{
<Configuration for next augmentation type>
}
]
}

Note that currently LayerNext supports only one Augmentation category - IMAGE_LEVEL.

The available augmentation types and their properties are listed in below table.

Type IDProperty IDDescriptionValue typeValid values or value ranges
FLIP_IMAGEFLIP_HORIZONTALIf this is True, horizontal flip will be applied, otherwise no operation applied.BooleanTrue, False
FLIP_VERTICALIf this is True, vertical flip will be applied, otherwise no operation applied.BooleanTrue, False
IMAGE_ROTATIONPERCENTAGE_SCALEImage will be rotated at an angle within the given maximum and minimum limit.Array of IntegerTwo angles within the range -360 to 36 0 (eg: -10, 30)
IMAGE_BLURBLURThe image will undergo Gaussian blurring, with a radius that falls within the specified minimum and maximum limits.Array of integerMaximum and minimum radius within the range 0 to 15 (eg: 10, 12)
NINETY_ROTATIONCLOCKWISEThe image will be rotated 90° clockwise if this True, otherwise no operation applied.BooleanTrue, False
COUNTER_CLOCKWISEThe image will be rotated 90° anticlockwise if this True, otherwise no operation applied.BooleanTrue, False
UPSIDE_DOWNThe image will be rotated upside down if this True, otherwise no operation applied.BooleanTrue, False
GRAYSCALEGRAYSCALE_PERCENTAGEMaximum and minimum percentage of grey scaling applied.Array of IntegerMaximum and minimum percentage within the range 0 to 100
HUEHUE_DEGREESMaximum and minimum degree of hue applied.Array of IntegerPair of the degrees within the range -50 to 50
SATURATIONSATURATION_DEGREESMaximum and minimum degree of saturation applied.Array of IntegerMaximum and minimum degree within the range -50 to 50
BRIGHTNESSBRIGHTNESS_DEGREESMaximum and minimum degree of brightness applied.Array of IntegerPair of degrees within the range -50 to 50
NOISENOISE_PERCENTAGEMaximum and minimum percentage of noise applied.Array of IntegerPair of percentages within the range 0 to 100
SHEARSHEAR_HORIZONTALMaximum and minimum horizontal shear angleArray of IntegerPair of angles from -40 to 40
SHEAR_VERTICALMaximum and minimum vertical shear angleArray of IntegerPair of angles from -40 to 40
CROPCROP_PERCENTAGEMaximum and minimum percentage of crop applied.Array of IntegerPair of percentages within the range 0 to 100

Example Augmentation configuration dictionary

{
"IMAGE_LEVEL": [
{
"id": "FLIP_IMAGE",
"properties": [
{
"id": "FLIP_HORIZONTAL",
"values": [
False
]
},
{
"id": "FLIP_VERTICAL",
"values": [
True
]
}
]
},
{
"id": "IMAGE_ROTATION",
"properties": [
{
"id": "PERCENTAGE_SCALE",
"values": [
-10,
30
]
}
]
},
{
"id": "GRAYSCALE",
"description": "",
"isSelected": True,
"properties": [
{
"id": "GRAYSCALE_PERCENTAGE",
"values": [
5,
10
]
}
]
}
]
}
```