Unique Classification

Dependent Models

The outputs of the following pipelines are used to determine any Unique Classification:

Pipeline Name	Pipeline Output Type
Image Preparation	Object Detection
Depth Registration	Semantic or Instance Segmentation
Unique Classification	Classification

Defining Domain Search and Signature Search

Different geological or geotechnical features require identification and modelling at different scales depending on the downstream use of this data, whether it's a lithology log that requires every metre along a drill hole to be assigned a rock type or a vein log identifying discrete intervals of a particular vein mineral or style.

To achieve similar outputs using a machine learning classification product, Datarock uses what are referred to as Domain Search and Signature Search models. These are defined as follows:

Domain Search: classification model trained at a resolution more suitable for interpretation at a scale more appropriate for large-scale geological features (e.g. lithology/rock type, alteration, weathering). Achieved by training the model using 1m interval row imagery data sourced from the Datarock Platform.
Signature Search: classification model trained at a resolution more suitable for identifying fine geological features at the centimetre-scale (e.g. veins, breccia, minerals). Achieved by training the model using 5cm interval square or tile imagery data sourced from the Datarock Platform.

Whilst the overall workflow is unchanged between the two types of classification, the different resolutions allow for greater flexibility and accuracy in creating a product that best suits the customer's requirements and resolution of outputs.

Naming conventions between these classifications for the final Domain or Signature Search products will be as follows: Feature (Domain/Signature search). For example, Lithology (Domain search) or Breccia (Signature search).

Data Processing

The outputs of the Datarock Unique Classification model, when applied over depth-registered rows, allow classification labels, including dominant label and class probability statistics of these predictions, to be calculated.

For Domain Search classification models, these depth-registered rows are simply used raw out of the Platform (if the row length is close to or equal to 1m) or alternatively merged into 1m interval images.

For Signature Search classification models, the depth-registered rows are cut into 5cm squares ('tiles') using customisable clipping and overlap parameters of the row imagery using the following settings.

Clip the top and bottom of each row image by 15%
- This removes influences from the edge of the tray
Overlap the squares by 20%
- This allows for capturing geological structure and texture at tile boundaries that could otherwise be unaccounted for.

Example: how the squares appear on depth-registered rows.

Illustration of the difference between a classification model and its potential outputs against an object detection, semantic segmentation (single-class) and instance segmentation (multi-class) model.

Detection of Unique Classification classes

Several classification classes can be detected by a Unique Classification model based on either manual labelling of tiles or interval imagery during onboarding, sampling for tiles or intervals based on geological or geotechnical logs, or via UMAP using the textural and colour differences of these tiles or intervals. The latter two can assist in further augmenting the training process.

These annotated tiles or intervals are split into training and evaluation datasets.

Example: compilation of training tiles for each class of a 10-class Lithology (Signature search) classification model.

Example: compilation of training intervals for each class of a 4-class Lithology (Domain search) classification model.

Product Configuration Options

There are no configuration aspects to this product.

Output Intervals

Default interval lengths: for Signature Search classification products, raw data is produced at 0.05m scale as well as composited intervals of 0.5m, 1m, 2m, 3m and 5m. For Domain Search classification products, the raw data is produced at a 1m scale or other fixed intervals, depending on the user's needs.

Are customisable intervals available? Yes, via uploading an assay or geology/geotechnical logging table to the Datarock Platform or Customer Success Team.

User Data

Sampling Intervals

User data may be uploaded to the platform via CSV in the following format for customisable intervals (assay or geology logs):

HOLEID_sampling_intervals_lithology.csv
HOLEID_sampling_intervals_alteration.csv
HOLEID_sampling_intervals_mineral.csv

Column Header	Description
depth_from	Start of interval
depth_to	End of interval
groundtruth	Class name (e.g. lithology class, alteration class, mineral class - depending on the model)

Data Output

Development is currently underway to enable viewing Domain Search predictions in Datarock Core.

Results from Signature Search products can be viewed in the Platform in the Results tab - overlain on core imagery - only when imagery is trained on 5cm square tiles. If imagery is trained on a different tile size (e.g. 1m or row tiles), results cannot currently be visualised in the Platform, but can be exported from the Platform.

Example: Lithology Classification Signature Search model classes visualised on the Platform

Results from this model can be obtained using the Downloads dropdown list:

The available CSV files include the following for the drill hole(s) selected:

Signature Search:

ProjectID_classificationProduct_square.csv
ProjectID_classificationProduct_square_all_classes.csv*
ProjectID_classificationProduct_interval.csv
ProjectID_classificationProduct_composite_Xm.csv
ProjectID_classificationProduct_composite_segment_Xm.csv
ProjectID_classificationProduct_composite_user_intervals.csv**

Domain Search:

HoleID_classificationProduct.csv
HoleID_classificationProduct_by_user_interval.csv**

* only available for download from the Platform and not via the Public API
** only if sampling intervals have been uploaded to the Platform or sent to the Customer Success Team

Composite Files

ProjectID_classificationProduct_composite_Xm.csv
ProjectID-classificationProduct_composite_segment_Xm.csv
ProjectID_classificationProduct_composite_user_intervals.csv

Column Header	Description
hole_id	Customer's Hole ID
depth_from_m	Start of interval (metres)
depth_to_m	End of interval (metres)
groundtruth	Logged class name as defined by the ground_truth column in the uploaded HOLEID_sampling_intervals_product.csv
dominant_class	The model predicted the outcome of the most likely/majority class within the selected interval
length_interval	Length of interval as defined by depth_from and depth_to
length_response	Length of core detected as defined by depth_from and depth_to
length_valid	Length of core detected as defined by depth_from and depth_to used for prediction
class name	A certain number of columns based on the name and number of trained classes. Each row shows a proportional prediction percentage of each class for that interval (adds up to 100%).

Raw Data Files