Faq - CholecTriplet2022: Surgical Action Triplet Detection

CholecTriplet2022: Surgical Action Triplet Detection Banner

Frequently Asked Questions (FAQ)¶

--- Challenge ---¶

1. What is CholecTriplet2022?
An Endoscopic vision challenge on the recognition and localization of tool-tissue interactions in surgical videos in the form of triplets. It is an upgrade of the CholecTriplet2021 challenge with localization task.

2. What is a triplet?
A combination of {instrument,verb,anatomical target} that describes a surgical action.

3. Who can participate?
Anyone who signs the challenge agreement except the members of the organizing lab.

4. Will the challenge submission still be open after the submission deadline?
Likely yes, but only submissions made before the deadline will be eligible for awards.

--- Registration ---¶

1. Why is my registration not yet approved?
For your registration to be approved, you must send a signed challenge contract. Check the Getting Started page for more details.

2. Must every member of my team submit a signed contract?
One contract per team is sufficient. However, every member of the team must abide by the terms and conditions in the signed contract. The dataset obtained after signature of this contract remains confidential and cannot be transferred to someone outside the team.

3. Is it mandatory to register as a team?
Yes.

4. What if I am working alone, must I still register as a team?
Yes. A team can comprise of only one person.

5. Must every member of my team register on the challenge website?
Yes.

6. When is the deadline for team registration?
All registrations end on July 1, 2022.

--- CholecT50 dataset ---¶

1. What labels in the dataset should be used to train the model?
Triplet labels. The standalone instrument labels, verb labels, and target labels are provided as additional labels should in case they can help your modeling. Their usage is optional and entirely depends on your propose methods. The localization can be learnt by weak supervision preferably using the instrument presence labels or the triplet labels.

2. I found some action triplet marked null for the verb & target components in the ground truth, is this an error?
No, some clinically valid triplets are not in the considered 100 triplet classes, due to their occurrence frequency and clinical relevance to the considered procedure.

3. How many types of null triplets are possible in the dataset?
The possible null triplet classes can be grouped into two as follows:
a.) Instrument inclusive null: this is seen in a situation where there is no instrument in the frame, or where the instrument involved in the action is not from valid classes in the dataset. In this case, the label is {null-instrument,null-verb,null-target}. Since triplet is a multi-label classification problem, this class is true only when all other classes are negative in one frame. It is NOT included in the 100 triplet classes.

b.) Non-instrument inclusive null: this is a situation where a valid instrument class is present, but verb or target involved in the action is from an invalid class, or the triplet combination is not in the considered 100 classes due to reason in (2) above. In this case, we retain only the instrument presence label while verb/target are marked null. They are 6 classes from such situation: {grasper,null-verb,null-target}, {bipolar,null-verb,null-target}, {hook,null-verb,null-target}, {scissors,null-verb,null-target}, {clipper,null-verb,null-target}, and {irrigator,null-verb,null-target}.

4. Is the train/val/test split different from the one used in the published papers?
Yes. For the challenge, we restrict the test set to only videos that are not in the public domain. We recommend 40/5 train/val split on the provided data but participants are entirely free to define their own splits. The training set is the entire CholecT45 dataset [7].

5. Is the challenge test dataset publicly available?
No, while the trained dataset are 45 videos of publicly available Cholec80 [1], the test set is a private dataset (not Cholec80) of the same type of surgery.

6. I have observed that there are some black images in the dataset, are these images corrupted?
No, as a privacy protection measure, we zeroed out all images that display the faces of the clinicians or the patients. For temporal consistency reason, we do not remove the zeroed frames from the dataset.

--- My Challenge Methods ---¶

1. What is the expected output of the model?
A model produces two type of outputs per frame:

A vector of N=100 probability scores for the triplet recognition in the format [score1, score2, ..., scoreN].
A list of box-triplet pairing for each positive triplet instance in the format: [[tripletID, instrumentID, confidence, x, y, w, h], [tripletID, instrumentID, confidence, x, y, w, h], . . .]

The model predictions for final submission will be converted to python dict and saved as JSON file. We will provide guide for this during submission.

2. Do I need to predict the instrument, verb and target separately?
No, while you may want to leverage the extra annotations provided for instrument, verb and target to improve your model, you are only required to predict the final triplet IDs as a vector[100] of probability scores.

3. Will the inference pipeline preserve the temporal information?
Yes. Participants are free to train their model either on sequential or shuffled frames. During testing, we will use an input setup that preserves the temporal frame order per video.

4. What is the frame-rate for the test set?
1 FPS. Same as the train data.

5. Is the testing going to be an online prediction?
Yes. We will maintain a real-time scenario during testing. This means that our test input setup will collect your model's outputs at time t before feeding the input frame at time t+1. Your method can accumulate and utilize previous frames information but not a future one.

5. My model performance is quite low, do I still need to submit?
Triplet recognition task is generally challenging. The average performance of a random model is 0.01%. So if you beat this performance, you have a good method to submit for the competition.

--- Baseline Methods ---¶

1. Where can I find a published paper/article on surgical triplet recognition?
* Tripnet [2]: first deep learning model for the recognition of action triplets in surgical videos:
[Nwoye C.I. et.al, Recognition of Instrument-Tissue Interactions in Endoscopic Videos via Action Triplets, MICCAI 2020]
Please, note that the models in this paper are trained and evaluated on CholecT40 (a subset of CholecT50)

* Rendezvous [3]: the journal extension of Tripnet baseline, trained on CholecT50:
[Nwoye C.I. et.al, Rendezvous: Attention Mechanisms for the Recognition of Surgical Action Triplets in Endoscopic Videos]

* Summary of methods and results in the previous action triplet challenge [4]:
[Nwoye C.I. et.al, Cholectriplet2021: A benchmark challenge for surgical action triplet recognition]

* Official dataset splits and benchmarking of baseline methods of surgical action triplet dataset [5]:
[Nwoye C.I. et.al, Data splits and metrics for method benchmarking on surgical action triplet datasets]

2. Where can I find a trained model (code) on triplet recognition?
Some code are available on CAMMA public git repo: https://github.com/CAMMA-public
We also provide a sample code in the colab code blog to help you get started.

Note, we do not provide the weights for any sample/published model.

3. Must I follow the same strategy as in the published papers?
No, you are free to develop any method that works for you: deep learning, machine learning, rule-based inference, etc.

4. Can I submit exactly the same model as in the published papers?
Submitting original and novel method is highly recommended, however, you are not constrained on what to submit.

--- Training ---¶

1. Is pretraining on a surgical dataset allowed?
Yes, you are free to pretrain your model on any third-party public dataset. Additional use of any private dataset is not allowed for this challenge.

--- Submission ---¶

1. How do I submit my method?
Methods are to be submitted as docker file. We will provide a docker template and submission guideline by June 2022.
The submission channel will open on July 1, 2022.

--- Evaluation ---¶

1. What is the metrics for the evaluation?
Our evaluation will be based on mean average precision (mAP) provided by ivtmetrics library [6]. The ivtmetrics is a special library for the evaluation of tool-tissue interaction detection. It can be installed using either pip or *conda *python library installer. More details about the metrics and its usage can be found on the method and evaluation page.

2. What will my model be evaluated on?
We will evaluate each model on 3 criteria:

Triplet recognition performance (mean average precision mAP).
Instrument localization performance (mAP at box IoU of 0.5).
Triplet detection performance (correct triplet-box matching).

We plan to award prize for best model in each of the sub-tasks.

--- Publication ---¶

1. Will my challenge submission be published?
We plan a joint publication of the surgical action triplet detection which will include the submitted challenge models and results. More information will be provided on this as time goes on.

2. Who will be co-authors?
Top N performing team can submit at most 2 qualifying authors. The sub-challenge organizers determine the order of the authors in a joint challenge paper. Information on the N number of teams will be provided before the challenge presentation and would depend on the number of participation.

3. When can a participant publish an independent research on this dataset?
Participants are allowed to publish their own results separately on triplet recognition using only the publicly released CholecT45 dataset [7]. However, triplet detection and localization results which is the prime focus of this challenge cannot be published until after a publication of a joint challenge paper. Publication cannot be made on CholecT50 [3] before the joint publication.

4. When will the joint results be published?
This should be expected before the end of 2023.

--- References ---¶

[1] Twinanda, A. P., Shehata, S., Mutter, D., Marescaux, J., De Mathelin, M., & Padoy, N. (2016). Endonet: a deep architecture for recognition tasks on laparoscopic videos. IEEE transactions on medical imaging, 36(1), 86-97. [2] Nwoye, C. I., Gonzalez, C., Yu, T., Mascagni, P., Mutter, D., Marescaux, J., & Padoy, N. (2020, October). Recognition of instrument-tissue interactions in endoscopic videos via action triplets. In International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 364-374). Springer, Cham. [3]Nwoye, C. I., Yu, T., Gonzalez, C., Seeliger, B., Mascagni, P., Mutter, D., Marescaux, J., & Padoy, N. (2021, September). Rendezvous: Attention Mechanisms for the Recognition of Surgical Action Triplets in Endoscopic Videos. Medical Image Analysis, 78, (2022) 102433. arXiv preprint arXiv:2109.03223. [4] Nwoye, C.I., Alapatt, D., Yu, T., Vardazaryan, A., Xia, F., ..., & Padoy, N. (2022, April) CholecTriplet2021: A benchmark challenge for surgical action triplet recognition. arXiv PrePrint arXiv:2204.04746. [5] Nwoye, C.I., Padoy, N. (2022, April) Data Splits and Metrics for Method Benchmarking on Surgical Action Triplet Datasets. arXiv PrePrint arXiv:2204.05235 [6] https://github.com/CAMMA-public/ivtmetrics [7] https://github.com/CAMMA-public/cholect45¶