Unique Classification
This document details Datarock’s Unique Classification products, including Domain Search and Signature Search.
Dependent Pipelines
The outputs of the following pipelines are used to determine any Unique Classification:
Pipeline Name |
Pipeline Output Type |
Image Preparation |
Object Detection |
Depth Registration |
Semantic or Instance Segmentation |
Unique Classification |
Classification |
Defining Domain Search and Signature Search
Different geological or geotechnical features require identification and modelling at different scales depending on the downstream use of this data, whether it's a lithology log that requires every metre along a drill hole to be assigned a rock type or a vein log identifying discrete intervals of a particular vein mineral or style.
To achieve similar outputs using a machine learning classification product, Datarock uses what are referred to as Domain Search and Signature Search models. These are defined as follows:
- Domain Search: classification model trained at a resolution more suitable for interpretation at a scale more appropriate for large-scale geological features (e.g. lithology/rock type, alteration, weathering). Achieved by training the model using 1m interval row imagery data sourced from the Datarock Platform.
- Signature Search: classification model trained at a resolution more suitable for identifying fine geological features at the centimetre-scale (e.g. veins, breccia, minerals). Achieved by training the model using 5cm interval square or tile imagery data sourced from the Datarock Platform.
Whilst the overall workflow is unchanged between the two types of classification, the different resolutions allow for greater flexibility and accuracy in creating a product that best suits the customer's requirements and resolution of outputs.
Naming conventions between these classifications for the final Domain or Signature Search products will be as follows: Feature (Domain/Signature search). For example, Lithology (Domain search) or Breccia (Signature search).
Data Processing
The outputs of the Datarock Unique Classification model, when applied over depth-registered rows, allow classification labels, including dominant label and class probability statistics of these predictions, to be calculated.
For Domain Search classification models, these depth-registered rows are simply used raw out of the Platform (if the row length is close to or equal to 1m) or alternatively merged into 1m interval images.
For Signature Search classification models, the depth-registered rows are cut into 5cm squares ('tiles') using customisable clipping and overlap parameters of the row imagery using the following settings.
-
Clip the top and bottom of each row image by 15%
-
This removes influences from the edge of the tray
-
-
Overlap the squares by 20%
-
This allows for capturing geological structure and texture at tile boundaries that could otherwise be unaccounted for.
-
An example of how the squares appear on depth-registered rows is shown below.
Below is an image illustrating the difference between a classification model and its potential outputs against an object detection, semantic segmentation (single-class) and instance segmentation (multi-class) model.
Detection of Unique Classification classes
Several classification classes can be detected by a Unique Classification model based on either manual labelling of tiles or interval imagery during onboarding, sampling for tiles or intervals based on geological or geotechnical logs, or via UMAP using the textural and colour differences of these tiles or intervals. The latter two can assist in further augmenting the training process.
These annotated tiles or intervals are split into training and evaluation datasets, and the following image shows a compilation of training tiles for each class of a 10-class Lithology (Signature search) classification model.
The following images show a compilation of training intervals for each class of a 4-class Lithology (Domain search) classification model.
Product Configuration Options
There are no configuration aspects to this product.
Output Intervals
Default interval lengths: for Signature Search classification products, raw data is produced at 0.05m scale as well as composited intervals of 0.5m, 1m, 2m, 3m and 5m. For Domain Search classification products, the raw data is produced at a 1m scale or other fixed intervals, depending on the user's needs.
Are customisable intervals available? Yes, via uploading an assay or geology/geotechnical logging table to the Datarock Platform or Customer Success Team.
User Data
Sampling Intervals
User data may be uploaded to the platform via CSV in the following format for customisable intervals (assay or geology logs):
- HOLEID_sampling_intervals_lithology.csv
- HOLEID_sampling_intervals_alteration.csv
- HOLEID_sampling_intervals_mineral.csv
CSV file to contain the following headers:
File Header |
Description |
---|---|
depth_from |
Start of interval |
depth_to |
End of interval |
groundtruth |
Class name (i.e. lithology class, alteration class, mineral class - depending on the model) |
Data Output
Results from this class of model can be obtained using the Download Product Data artefacts option from the Actions button in the Model Review tab of Datarock. The available CSV files include the following for the drill hole(s) selected:
Signature Search CSVs:
- ProjectID_classificationProduct_square.csv
- ProjectID_classificationProduct_square_all_classes.csv*
- ProjectID_classificationProduct_interval.csv
- ProjectID_classificationProduct_composite_Xm.csv
-
ProjectID_classificationProduct_composite_segment_Xm.csv
-
ProjectID_classificationProduct_composite_user_intervals.csv**
Domain Search CSVs:
- HoleID_classificationProduct.csv
- HoleID_classificationProduct_by_user_interval.csv**
* only available for download from the Platform and not via the Public API
** only if sampling intervals have been uploaded to the Platform or sent to the Customer Success Team
These CSV files contain the following headers:
Composite Data FilesProjectID_classificationProduct_composite_Xm.csvProjectID_classificationProduct_composite_segment_Xm.csvProjectID_classificationProduct_composite_user_intervals.csv |
|
Column Header |
Description |
hole_id |
Customer’s Hole ID |
depth_from_m |
Start of interval (metres) |
depth_to_m |
End of interval (metres) |
groundtruth |
Logged class name as defined by the ground_truth column in the uploaded HOLEID_sampling_intervals_product.csv |
dominant_class |
The model predicted the outcome of the most likely/majority class within the selected interval |
length_interval |
Length of interval as defined by depth_from and depth_to |
length_response |
Length of core detected as defined by depth_from and depth_to |
length_valid |
Length of core detected as defined by depth_from and depth_to used for prediction |
class name |
A certain number of columns based on the name and number of trained classes. Each row shows a proportional prediction percentage of each class for that interval (adds up to 100%). |
Raw Data FilesSignature Search:
Domain Search:
|
|
Column Header |
Description |
hole_id |
Customer’s Hole ID |
depth_from_m |
Start of tile interval (metres) |
depth_to_m |
End of tile interval (metres) |
box_number |
Platform-assigned box number reference |
class/prediction |
Class with the highest probability predicted for square interval |
probability |
Statistical confidence of the model in selecting the most correct class for the square |
inference_timestamp (only available in Signarute Search CSVs) |
The timestamp of when the model calculated the class |
classification_source (only available in Signarute Search CSVs) |
What was used to derive the class prediction (model) |
edited_by (only available in Signarute Search CSVs) |
Name of the author of any edits conducted on the class outcome (if applicable) |
version (only available in Signarute Search CSVs) |
Model version identifier |
class_probability (*only available in square_all_classes.csv and classificationProduct_by_user_interval.csv) |
A certain number of columns based on the name and number of trained classes. Each row shows the statistical confidence of the model in selecting each class for the square |
Product Limitations
Limitations |
Comments |
Reliance on row detection and depth registration |
The Unique Classification model is based on predicting geological features within a 1m interval or 5cm tile derived from row imagery. The dependency on the depth-registered rows being identified means that if a row is missed by the row model during Image Preparation, the subsequent tiles will not have the classification model applied. |
Training is dependent on what can be seen within a tile image |
Datarock’s Unique Classification model relies on classes being predicted using visually identifiable RGB features, some of which are too subtle to predict from a photo, in particular if the resolution is poor. Tile imagery during the training process also does not provide geological context for individual squares when sampled at random. For example, two red-coloured tiles/intervals could both be predicted as “oxide”, but one is from 1m depth in a surface hole (weathered rock and therefore a true positive), and the other at 1,000m depth (hematite-oxidation in an IOCG and therefore a false positive). Site-based logging can be used to assist in-class training on these false positives, however, logging is not always at the same resolution as 5cm tiles. Tiles/intervals that fall at the end of the rows may only contain a fraction of rock and thus can cause confusion for the model. To minimise any model confusion, these examples are generally trained as an “Other” class along with tiles that contain solely a core block or an empty tray. |
Training data must be representative of the whole area classification is to be applied |
If new imagery or classification classes are introduced to the model, the performance may decline as these examples were not trained during onboarding. An initial model evaluation will need to be undertaken to see the suitability of the model, in particular against any new imagery. Ideally, a new model version is trained to incorporate the new untrained tiles or classes. |
Document Version
Version |
Date |
Author |
Rationale |
1 |
11 August 2023 |
C Brown |
Initial release. |
2 |
03 January 2025 |
M Fracchia |
User Data update |
3 |
07 January 2025 |
N Pittaway |
Data Output update |
4 |
17 July 2025 |
N Pittaway |
Addition of Domain vs Signature search |