基于視覺變換器的級聯多階層醫學影像配準方法_《生物醫學工程學雜志》

作者：

潘英杰 ¹ ,  程遠志 ^1,2 , 劉豪 ¹ , 史操 ¹

1. 青島科技大學信息科學技術學院（山東青島 266000）;
2. 哈爾濱工業大學計算機科學與技術學院（哈爾濱 150001）;

關鍵詞：

醫學影像多階層配準困難形變感知級聯網絡自注意力機制

DOI：

10.7507/1001-5515.202204011

視頻：

導出 下載 收藏 掃碼 引用

摘要 全文 圖表 視頻 參考文獻 施引文獻 補充材料

在基于深度學習的圖像配準中，圖像中具有復雜解剖結構的形變區域是影響網絡配準精度的重要因素，然而現有方法很難關注到圖像的復雜解剖區域。同時，卷積神經網絡的感受野受其卷積核大小的限制，難以學習空間位置距離較遠的體素之間的關系，使其難以處理較大區域形變問題。針對以上兩個問題，本文提出了一種基于視覺變換器（Transformer）的級聯多階層配準網絡模型，并配備了一種基于均方誤差的困難形變感知機。困難形變感知機使用滑動窗口和浮動窗口技術在配準圖像中進行檢索，得到每個體素的困難形變系數，識別出配準效果最差的區域。本研究中，級聯多階層配準網絡模型采用困難形變感知機進行階層連接，在基礎配準網絡中憑借自注意力機制提取全局特征，對不同尺度的配準結果進行優化。實驗結果證明，本文提出的方法可以對復雜形變區域進行漸進配準，從而優化腦部醫學影像的配準結果，對醫生的臨床診斷工作有良好的輔助作用。

引用本文： 潘英杰, 程遠志, 劉豪, 史操. 基于視覺變換器的級聯多階層醫學影像配準方法. 生物醫學工程學雜志, 2022, 39(5): 876-886. doi: 10.7507/1001-5515.202204011 復制

引言

可形變醫學圖像配準作為醫學影像處理和分析的一項基礎性任務，其目標是找到參考圖像和待配準圖像間體素的位移關系，識別并對齊圖像中相同或者相似的解剖結構，精確的醫學影像配準是一項具有挑戰性的工作。

基于深度學習的可形變醫學圖像配準，按照網絡的訓練策略大體分為兩類：監督學習方法和無監督學習方法。基于監督學習的配準方法，需要在訓練網絡時提供待配準圖像對的標準位移形變場，但標準位移形變場需首先通過傳統配準方法獲得，這使監督學習網絡的配準精度難以超越傳統方法^[1]。為了解決監督學習中標簽限制問題，大量研究人員開始研究無監督學習的配準方法。Jaderberg等^[2]提出了一種空間變換網絡（spatial transformer networks，STN），該網絡支持神經網絡的反向傳播，直接使用形變場扭曲待配準圖像，STN的發布啟發了許多無監督圖像配準方法。Balakrishnan等^[3]通過結合U型網絡^[4]和STN開發了體素變形網絡（voxelmorph，VM），以無監督學習的方式對核磁共振成像（magnetic resonance imaging，MRI）腦圖譜進行配準，該方法僅通過優化自定義的損失函數就可以實現圖像配準。圖像中小形變區域通常通過單次配準即可實現對齊，而大形變區域往往需要多次配準才能對齊。在Zhao等^[5]提出的遞歸級聯網絡中，級聯網絡的思想被證明可以優化圖像配準的結果，但級聯網絡的每層網絡都需要輸入參考圖像和待配準圖像，這造成已經對齊的較小形變區域再次參與網絡運算，步驟冗余沒有意義。為解決此問題，Kim等^[6]提出循環體素變形網絡（cyclemorph，CM），利用循環一致性進行多尺度配準，逐步優化形變場。Huang等^[7]嘗試將網絡的中間特征重采樣以識別感興趣區域，這種依賴中間特征結果的識別策略在網絡訓練早期極易出現誤判和漏判感興趣區域的問題。基于以上研究結果，本文方法采用級聯多階層網絡進行多尺度配準，通過選定的策略識別對齊最差的區域，保證識別結果的穩定性。

卷積神經網絡（convolutional neural network，CNN）被廣泛應用于圖像配準領域，但卷積運算的感受野受卷積核尺寸的限制，難以學習圖像的全局特征。Li等^[8]在其研究中發現，隨著卷積層的加深，距離較遠的體素點之間的相互影響會迅速衰減，這使得CNN很難學習圖像中的全局特征關系。擁有自注意力機制的視覺變換器（Transformer）的出現有效解決了CNN無法有效提取圖像全局特征的問題^[9]，例如Liu等^[10]提出擁有移位窗口的層次化Transformer，其計算量只與窗口數量呈線性關系，改善了計算代價高昂的問題。Chen等^[11]首次應用Transformer進行無監督配準研究，與V型分割網絡結合^[12]，并在后續任務中擴展了當前模型，提出變形網絡（transmorph，TM）捕獲待配準圖像對之間的語義關系，在定量結果上證明了其提出的架構的有效性和先進性^[13]。但將Transformer應用于醫學圖像配準的研究目前仍處于起步階段。

針對上述問題，本文提出了一種基于Transformer的級聯多階層配準網絡模型進行醫學影像配準。該模型包括：① 對原始MRI腦圖譜進行標準預處理操作。② 構建CNN提取局部特征和Transformer提取全局特征的基礎配準網絡。③ 使用困難形變感知機提取復雜形變區域，采用多階層方法級聯多個基礎配準網絡，漸進優化不同尺度的配準結果，解決圖像配準中的復雜形變問題。綜上所述，課題組期望本文提出的方法可以漸進優化配準結果，提升圖像的配準精度，今后能夠幫助醫生在臨床診斷中做出更加準確的判斷。

1 方法

1.1 總體架構

局部是整體的一部分，在計算機視覺領域，整體圖像可以看作由多個局部圖像組成，局部圖像質量的提升可以提高整體圖像的質量。對齊較好的區域受參考圖像的限制，即使再優化也很難帶來整體配準性能的顯著提升，相反，對齊較差的區域可優化空間較大，進行再優化可以顯著提高整體配準性能。基于Transformer的級聯多階層配準網絡模型采用一種基于困難形變系數（difficult deformation coefficient，DDC）的困難形變感知機篩選圖像中對齊較差的區域，以分階層的方式優化不同尺度的對齊較差區域。此模型中對齊較差的區域是相對于圖像中其他局部區域而言的，在本文中也被稱為困難形變區域，當整體配準結果較好時，多階層配準網絡依然可以選擇圖像的最困難形變區域進行優化。該模型的總體架構如圖1所示。

圖1 級聯多階層配準網絡模型 Figure1. Cascaded multi-level registration network model

圖選項

方法	LPBA40		OASIS-1
方法	DSC	NJD（%）	DSC	NJD（%）
原始圖像	0.538 ± 0.050	—	0.576 ± 0.065	—
SYN	0.687 ± 0.025	< 0.000001	0.768 ± 0.033	0.000156
CM	0.668 ± 0.046	0.060 322	0.779 ± 0.034	0.424 978
VM	0.660 ± 0.044	0.042571	0.791 ± 0.027	0.294273
TM	0.673 ± 0.047	0.040855	0.804 ± 0.024	0.278083
本文方法	0.689 ± 0.048	0.036250	0.812 ± 0.023	0.299963

階層數	LPBA40	OASIS-1
階層數	DSC	DSC
第一階層	0.685 ± 0.045	0.799 ± 0.024
第二階層	0.688 ± 0.048	0.811 ± 0.023
第三階層	0.689 ± 0.050	0.813 ± 0.024
第四階層	0.690 ± 0.050	0.814 ± 0.026

圖像類型	RMSE	SSIM	MI
全局圖像	0.025 415	0.918 959	0.764 806
局部圖像	0.045 127	0.811 169	0.686 444

1.	Haskins G, Kruger U, Yan Pingkun. Deep learning in medical image registration: a survey. arXiv: 1903.02026, 2020. https://doi.org/10.48550/arXiv.1903.02026.
2.	Jaderberg M, Simonyan K, Zisserman A, et al. Spatial transformer networks//the 28th International Conference on Neural Information Processing Systems-Volume 2 (NIPS), 2015. https://doi.org/10.48550/arXiv.1506.02025.
3.	Balakrishnan G, Zhao A, Sabuncu M R, et al. An unsupervised learning model for deformable medical image registration//Proceedings of the IEEE conference on computer vision and pattern recognition(CVPR). 2018. https://doi.org/10.48550/arXiv.1802.02604.
4.	Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation//International Conference on Medical image computing and computer-assisted intervention(MICCAI), Cham: Springer, 2015: 234-241.
5.	Zhao Shengyu, Dong Yue, Chang E I, et al. Recursive cascaded networks for unsupervised medical image registration//Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV). 2019: 10600-10610. https://doi.org/10.48550/arXiv.1907.12353.
6.	Kim B, Kim D H, Park S H, et al. CycleMorph: cycle consistent unsupervised deformable image registration. Med Image Anal, 2021, 71: 102036.
7.	Huang Y, Ahmad S, Fan J, et al. Difficulty-aware hierarchical convolutional neural networks for deformable registration of brain MR images. Med Image Anal, 2021, 67: 101817.
8.	Li Shaohua, Sui Xiuchao, Luo Xiangde, et al. Medical image segmentation using squeeze-and-expansion transformers. arXiv: 2105.09511, 2021. https://doi.org/10.48550/arXiv.2105.09511.
9.	Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16 × 16 words: transformers for image recognition at scale. arXiv: 2010.11929, 2020. https://doi.org/10.48550/arXiv.2010.11929.
10.	Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows//The IEEE/CVF International Conference on Computer Vision(ICCV). 2021: 10012-10022.
11.	Chen Junyu, He Yufan, Frey E C, et al. ViT-V-Net: vision transformer for unsupervised volumetric medical image registration. arXiv: 2104.06468, 2021. https://doi.org/10.48550/arXiv.2104.06468.
12.	Milletari F, Navab N, Ahmadi S A. V-net: fully convolutional neural networks for volumetric medical image segmentation//2016 fourth international conference on 3D vision (3DV). IEEE, 2016: 565-571.
13.	Chen Junyu, Du Yong, He Yufan, et al. TransMorph: Transformer for unsupervised medical image registration. arXiv: 2111.10480, 2021. https://doi.org/10.48550/arXiv.2111.10480.
14.	Vercauteren T, Pennec X, Perchant A, et al. Diffeomorphic demons: efficient non-parametric image registration. Neuroimage, 2009, 45(1 Suppl): S61-S72.
15.	Zhou H Y, Guo J, Zhang Y, et al. nnFormer: interleaved transformer for volumetric segmentation. arXiv: 2109.03201, 2021. https://doi.org/10.48550/arXiv.2109.03201.
16.	Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need// 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach: NIPS, 2017: 6000-6010.
17.	Zeiler M D, Taylor G W, Fergus R. Adaptive deconvolutional networks for mid and high level feature learning//2011 International Conference on Computer Vision (ICCV). Barcelona: IEEE, 2011: 12491108.
18.	Allen D M. Mean square error of prediction as a criterion for selecting variables. Technometrics, 1971, 13(3): 469-475.
19.	Shattuck DW, Mirza M, Adisetiyo V, et al. Construction of a 3D probabilistic atlas of human cortical structures. Neuroimage, 2008, 39(3): 1064-1080.
20.	Marcus DS, Wang TH, Parker J, et al. Open Access Series of Imaging Studies (OASIS): cross-sectional MRI data in young, middle aged, nondemented, and demented older adults. J Cogn Neurosci, 2007, 19(9): 1498-1507.
21.	Fischl B. FreeSurfer. NeuroImage, 2012, 62(2): 774-781.
22.	Avants B B, Epstein C L, Grossman M, et al. Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain. Med Image Anal, 2008, 12(1): 26-41.
23.	Klein A, Andersson J, Ardekani B A, et al. Evaluation of 14 nonlinear deformation algorithms applied to human brain MRI registration. Neuroimage, 2009, 46(3): 786-802.
24.	Avants B B, Tustison N J, Song G, et al. A reproducible evaluation of ANTs similarity metric performance in brain image registration. NeuroImage, 2011, 54(3): 2033-2044.
25.	Paszke A, Gross S, Massa F, et al. PyTorch: an imperative style, high-performance deep learning library// the 33rd International Conference on Neural Information Processing Systems (NeurIPS 2019), 2019. https://doi.org/10.48550/arXiv.1912.01703.
26.	Dice L R. Measures of the amount of ecologic association between species. Ecology, 1945, 26(3): 297-302.
27.	Dacorogna B, Moser J. On a partial differential equation involving the jacobian determinant. Annales de l'Institut Henri Poincaré C, Analyse non linéaire, 1990, 7(1): 1-26.
28.	Wang Shiqi, Rehman A, Wang Zhou, et al. SSIM-motivated rate-distortion optimization for video coding. IEEE Transactions on Circuits and Systems for Video Technology, 2011, 22(4): 516-529.
29.	Viola P, Wells III W M. Alignment by maximization of mutual information//IEEE International Conference on Computer Vision, 1995: 16-23. DOI: 10.1109/ICCV.1995.466930.

《生物醫學工程學雜志》

基于視覺變換器的級聯多階層醫學影像配準方法

摘要 全文 圖表 視頻 參考文獻 施引文獻 補充材料

引言

1 方法

1.1 總體架構

1.2 基礎配準網絡

1.2.1 CNN編碼器-解碼器

1.2.2 Transformer編碼器-解碼器

1.2.3 特征指導模塊

1.3 困難形變感知機

1.4 空間變換網絡

1.5 損失函數

2 實驗

2.1 數據集

2.2 實驗配置

2.3 評估指標

2.4 結果分析

2.4.1 配準精度

2.4.2 階層對比

2.4.3 感興趣區域識別

3 結論

引言

1 方法

1.1 總體架構

1.2 基礎配準網絡

1.2.1 CNN編碼器-解碼器

1.2.2 Transformer編碼器-解碼器

1.2.3 特征指導模塊

1.3 困難形變感知機

1.4 空間變換網絡

1.5 損失函數

2 實驗

2.1 數據集

2.2 實驗配置

2.3 評估指標

2.4 結果分析

2.4.1 配準精度

2.4.2 階層對比

2.4.3 感興趣區域識別

3 結論

上一篇

下一篇

Format

Content

摘要全文圖表視頻參考文獻施引文獻補充材料