This is an open access article published by the IET under the Creative Commons Attribution-NonCommercial-NoDerivs License (http://creativecommons.org/licenses/by-nc-nd/3.0/)
Computer-assisted interventions (CAI) aim to increase the effectiveness, precision and repeatability of procedures to improve surgical outcomes. The presence and motion of surgical tools is a key information input for CAI surgical phase recognition algorithms. Vision-based tool detection and recognition approaches are an attractive solution and can be designed to take advantage of the powerful deep learning paradigm that is rapidly advancing image recognition and classification. The challenge for such algorithms is the availability and quality of labelled data used for training. In this Letter, surgical simulation is used to train tool detection and segmentation based on deep convolutional neural networks and generative adversarial networks. The authors experiment with two network architectures for image segmentation in tool classes commonly encountered during cataract surgery. A commercially-available simulator is used to create a simulated cataract dataset for training models prior to performing transfer learning on real surgical data. To the best of authors’ knowledge, this is the first attempt to train deep learning models for surgical instrument detection on simulated data while demonstrating promising results to generalise on real data. Results indicate that simulated data does have some potential for training advanced classification methods for CAI systems.
References
-
-
1)
-
11. Lin, H.C., Shafran, I., Murphy, T.E., et al: ‘Automatic detection and segmentation of robot-assisted surgical motions’, Med. Image Comput. Comput. Assist. Interv., 2005, 8, (Pt 1), pp. 802–810.
-
2)
-
19. Sugand, K., Mawkin, M., Gupte, C.: ‘Validating touch surgery™: a cognitive task simulation and rehearsal app. for intramedullary femoral nailing’, Injury, 2015, 46, (11), pp. 2212–2216 (doi: 10.1016/j.injury.2015.05.013).
-
3)
-
22. Jia, Y., Shelhamer, E., Donahue, J., et al: ‘Caffe: convolutional architecture for fast feature embedding’. Proc. ACM Int. Conf. Multimedia (ACM), Orlando, November 2014, pp. 675–678.
-
4)
-
1. Maier-Hein, L., Vedula, S., Speidel, S., et al: ‘Surgical data science: enabling next-generation surgery’, , 2017.
-
5)
-
15. Russakovsky, O., Deng, J., Su, H., et al: ‘Imagenet large scale visual recognition challenge’, Int. J. Comput. Vis., 2015, 115, (3), pp. 211–252 (doi: 10.1007/s11263-015-0816-y).
-
6)
-
17. Twinanda, A.P., Shehata, S., Mutter, D., et al: ‘Endonet: a deep architecture for recognition tasks on laparoscopic videos’, IEEE Trans. Med. Imaging, 2017, 36, (1), pp. 86–97 (doi: 10.1109/TMI.2016.2593957).
-
7)
-
24. Manish, S., Anirban, M., Angelika, S., et al: ‘Addressing multi-label imbalance problem of surgical tool detection using CNN’, Int. J. Comput. Assist. Radiol. Surg., 2017, 12, (6), pp. 1013–1020 (doi: 10.1007/s11548-017-1565-x).
-
8)
-
9. Du, X., Allan, M., Dore, A., et al: ‘Combined 2D and 3D tracking of surgical instruments for minimally invasive and robotic-assisted surgery’, Int. J. Comput. Assist. Radiol. Surg., 2016, 11, (6), pp. 1109–1119 (doi: 10.1007/s11548-016-1393-4).
-
9)
-
16. Girshick, R., Donahue, J., Darrell, T., et al: ‘Rich feature hierarchies for accurate object detection and semantic segmentation’. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), Columbus, June 2014, pp. 580–587.
-
10)
-
18. Isola, P., Zhu, J.-Y., Zhou, T., et al: ‘Image-to-image translation with conditional adversarial networks’. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR), Hawaii, July 2017.
-
11)
-
4. Padoy, N., Blum, T., Ahmadi, S.A., et al: ‘Statistical modeling and recognition of surgical workflow’, Med. Image Anal., 2012, 16, (3), pp. 632–641 (doi: 10.1016/j.media.2010.10.001).
-
12)
-
10. DiPietro, R., Lea, C., Malpani, A., et al: ‘Recognizing surgical activities with recurrent neural networks’, Medical Image Computing and Computer-Assisted Intervention, 2016 (, 9900), pp. 551–558.
-
13)
-
14. Long, J., Shelhamer, E., Darrell, T.: ‘Fully convolutional networks for semantic segmentation’. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Boston, October 2015, pp. 3431–3440.
-
14)
-
7. Zappella, L., Béjar, B., Hager, G., et al: ‘Surgical gesture classification from video and kinematic data’, Med. Image Anal., 2013, 17, (7), pp. 732–745 (doi: 10.1016/j.media.2013.04.007).
-
15)
-
5. Dosis, A., Aggarwal, R., Bello, F., et al: ‘Synchronized video and motion analysis for the assessment of procedures in the operating theater’, Arch. Surg., 2005, 140, (3), pp. 293–299 (doi: 10.1001/archsurg.140.3.293).
-
16)
-
3. Vedula, S.S., Ishii, M., Hager, G.D.: ‘Objective assessment of surgical technical skill and competency in the operating room’, Annu. Rev. Biomed. Eng., 2017, 19, (1), pp. 301–325 (doi: 10.1146/annurev-bioeng-071516-044435).
-
17)
-
6. Katić, D., Wekerle, A.L., Gärtner, F., et al: ‘Knowledge-driven formalization of laparoscopic surgeries for rule-based intraoperative context-aware assistance’. Information Processing in Computer-Assisted Interventions (IPCAI), 2014 (, 8498), pp. 158–167.
-
18)
-
2. Weiser, T.G., Haynes, A.B., Lashoher, A., et al: ‘Perspectives in quality: designing the WHO surgical safety checklist’, Int. J. Qual. Health Care, 2010, 22, (5), pp. 365–370 (doi: 10.1093/intqhc/mzq039).
-
19)
-
21. Everingham, M., Van Gool, L., Williams, C.K.I., et al: ‘The PASCAL visual object classes challenge 2011 (VOC2011) results’, Int. J. Comput. Vis., 2011, 88, (2), pp. 303–338 (doi: 10.1007/s11263-009-0275-4).
-
20)
-
12. Bouget, D., Allan, M., Stoyanov, D., et al: ‘Vision-based and marker-less surgical tool detection and tracking: a review of the literature’, Med. Image Anal., 2017, 35, pp. 633–654 (doi: 10.1016/j.media.2016.09.003).
-
21)
-
13. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ‘Imagenet classification with deep convolutional neural networks’. Advances in Neural Information Processing Systems (NIPS), 2012, pp. 1097–1105.
-
22)
-
8. Rieke, N., Tan, D.J., Alsheakhali, M., et al: ‘Surgical tool tracking and pose estimation in retinal microsurgery’. Medical Image Computing and Computer-Assisted Intervention, 2015 (, 9349), pp. 266–273.
-
23)
-
20. Simonyan, K., Zisserman, A.: ‘Very deep convolutional networks for large-scale image recognition’. Int. Conf. Learning Representations (ICRL), San Diego, May 2015, pp. 1–14.
-
24)
-
23. Eigen, D., Fergus, R.: ‘Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture’. Proc. Int. Conf. Computer Vision (ICCV), Santiago, December 2015, pp. 2650–2658.
http://iet.metastore.ingenta.com/content/journals/10.1049/htl.2017.0064
Related content
content/journals/10.1049/htl.2017.0064
pub_keyword,iet_inspecKeyword,pub_concept
6
6