2022-10-17 00:06:39 +08:00
|
|
|
# IGEV-Stereo & IGEV-MVS
|
2023-03-13 14:41:09 +08:00
|
|
|
|
2023-03-12 20:28:25 +08:00
|
|
|
This repository contains the source code for our paper:
|
2022-08-06 15:50:37 +08:00
|
|
|
|
2023-03-14 09:05:14 +08:00
|
|
|
[Iterative Geometry Encoding Volume for Stereo Matching](https://arxiv.org/pdf/2303.06615.pdf)<br/>
|
2023-03-12 20:28:25 +08:00
|
|
|
CVPR 2023 <br/>
|
|
|
|
Gangwei Xu, Xianqi Wang, Xiaohuan Ding, Xin Yang<br/>
|
2022-08-06 15:50:37 +08:00
|
|
|
|
2023-03-12 20:37:24 +08:00
|
|
|
<img src="IGEV-Stereo/IGEV-Stereo.png">
|
|
|
|
|
2023-03-12 22:10:49 +08:00
|
|
|
## Demos
|
|
|
|
Pretrained models can be downloaded from [google drive](https://drive.google.com/drive/folders/1SsMHRyN7808jDViMN1sKz1Nx-71JxUuz?usp=share_link)
|
|
|
|
|
2023-03-30 16:21:27 +08:00
|
|
|
We assume the downloaded pretrained weights are located under the pretrained_models directory.
|
|
|
|
|
2023-03-12 22:10:49 +08:00
|
|
|
You can demo a trained model on pairs of images. To predict stereo for Middlebury, run
|
|
|
|
```
|
|
|
|
python demo.py --restore_ckpt ./pretrained_models/sceneflow/sceneflow.pth
|
|
|
|
```
|
2023-03-12 22:12:06 +08:00
|
|
|
|
|
|
|
<img src="IGEV-Stereo/demo-imgs.png" width="90%">
|
2023-03-12 22:10:49 +08:00
|
|
|
|
2023-03-23 11:18:17 +08:00
|
|
|
## Comparison with RAFT-Stereo
|
|
|
|
|
|
|
|
| Method | KITTI 2012 <br> (3-noc) | KITTI 2015 <br> (D1-all) | Memory (G) | Runtime (s) |
|
|
|
|
|:-:|:-:|:-:|:-:|:-:|
|
|
|
|
| RAFT-Stereo | 1.30 % | 1.82 % | 1.02 | 0.38 |
|
|
|
|
| IGEV-Stereo | 1.12 % | 1.59 % | 0.66 | 0.18 |
|
|
|
|
|
|
|
|
|
2022-08-06 15:50:37 +08:00
|
|
|
## Environment
|
2023-03-12 20:28:25 +08:00
|
|
|
* NVIDIA RTX 3090
|
2022-08-06 15:50:37 +08:00
|
|
|
* Python 3.8
|
|
|
|
* Pytorch 1.12
|
|
|
|
|
|
|
|
### Create a virtual environment and activate it.
|
|
|
|
|
|
|
|
```
|
2022-09-27 14:45:53 +08:00
|
|
|
conda create -n IGEV_Stereo python=3.8
|
|
|
|
conda activate IGEV_Stereo
|
2022-08-06 15:50:37 +08:00
|
|
|
```
|
|
|
|
### Dependencies
|
|
|
|
|
|
|
|
```
|
|
|
|
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch -c nvidia
|
|
|
|
pip install opencv-python
|
|
|
|
pip install scikit-image
|
|
|
|
pip install tensorboard
|
|
|
|
pip install matplotlib
|
|
|
|
pip install tqdm
|
|
|
|
pip install timm==0.5.4
|
|
|
|
```
|
2023-03-12 20:49:36 +08:00
|
|
|
|
2023-03-12 21:17:44 +08:00
|
|
|
## Required Data
|
|
|
|
To evaluate/train IGEV-Stereo, you will need to download the required datasets.
|
2023-03-12 21:19:36 +08:00
|
|
|
* [Scene Flow](https://lmb.informatik.uni-freiburg.de/resources/datasets/SceneFlowDatasets.en.html)
|
2023-03-12 21:17:44 +08:00
|
|
|
* [KITTI](http://www.cvlibs.net/datasets/kitti/eval_scene_flow.php?benchmark=stereo)
|
|
|
|
* [Middlebury](https://vision.middlebury.edu/stereo/data/)
|
|
|
|
* [ETH3D](https://www.eth3d.net/datasets#low-res-two-view-test-data)
|
|
|
|
|
|
|
|
By default `stereo_datasets.py` will search for the datasets in these locations.
|
|
|
|
|
|
|
|
```
|
|
|
|
├── /data
|
|
|
|
├── sceneflow
|
|
|
|
├── frames_finalpass
|
|
|
|
├── disparity
|
|
|
|
├── KITTI
|
|
|
|
├── KITTI_2012
|
|
|
|
├── training
|
|
|
|
├── testing
|
|
|
|
├── vkitti
|
|
|
|
├── KITTI_2015
|
|
|
|
├── training
|
|
|
|
├── testing
|
|
|
|
├── vkitti
|
|
|
|
├── Middlebury
|
|
|
|
├── trainingH
|
|
|
|
├── trainingH_GT
|
|
|
|
├── ETH3D
|
|
|
|
├── two_view_training
|
|
|
|
├── two_view_training_gt
|
2023-03-20 20:02:42 +08:00
|
|
|
├── DTU_data
|
|
|
|
├── dtu_train
|
|
|
|
├── dtu_test
|
2023-03-12 21:17:44 +08:00
|
|
|
```
|
|
|
|
|
|
|
|
## Evaluation
|
|
|
|
|
2023-03-20 20:06:51 +08:00
|
|
|
To evaluate on Scene Flow or Middlebury or ETH3D, run
|
|
|
|
|
2023-03-12 21:17:44 +08:00
|
|
|
```Shell
|
|
|
|
python evaluate_stereo.py --restore_ckpt ./pretrained_models/sceneflow/sceneflow.pth --dataset sceneflow
|
|
|
|
```
|
2023-03-20 20:06:51 +08:00
|
|
|
or
|
|
|
|
```Shell
|
|
|
|
python evaluate_stereo.py --restore_ckpt ./pretrained_models/sceneflow/sceneflow.pth --dataset middlebury_H
|
|
|
|
```
|
|
|
|
or
|
|
|
|
```Shell
|
|
|
|
python evaluate_stereo.py --restore_ckpt ./pretrained_models/sceneflow/sceneflow.pth --dataset eth3d
|
|
|
|
```
|
2023-03-12 21:17:44 +08:00
|
|
|
|
|
|
|
## Training
|
|
|
|
|
|
|
|
To train on Scene Flow, run
|
|
|
|
|
|
|
|
```Shell
|
|
|
|
python train_stereo.py
|
|
|
|
```
|
|
|
|
|
|
|
|
To train on KITTI, run
|
|
|
|
```Shell
|
|
|
|
python train_stereo.py --restore_ckpt ./pretrained_models/sceneflow/sceneflow.pth --dataset kitti
|
|
|
|
```
|
|
|
|
|
|
|
|
## Submission
|
|
|
|
|
|
|
|
For submission to the KITTI benchmark, run
|
|
|
|
```Shell
|
|
|
|
python save_disp.py
|
|
|
|
```
|
|
|
|
|
2023-03-20 20:02:42 +08:00
|
|
|
## MVS training and evaluation
|
|
|
|
|
|
|
|
To train on DTU, run
|
|
|
|
|
|
|
|
```Shell
|
|
|
|
python train_mvs.py
|
|
|
|
```
|
|
|
|
|
|
|
|
To evaluate on DTU, run
|
|
|
|
|
|
|
|
```Shell
|
|
|
|
python evaluate_mvs.py
|
|
|
|
```
|
|
|
|
|
2023-03-16 10:08:22 +08:00
|
|
|
## Citation
|
|
|
|
|
|
|
|
If you find our work useful in your research, please consider citing our paper:
|
|
|
|
|
2023-03-18 10:46:05 +08:00
|
|
|
```bibtex
|
|
|
|
@inproceedings{xu2023iterative,
|
2023-03-16 10:08:22 +08:00
|
|
|
title={Iterative Geometry Encoding Volume for Stereo Matching},
|
|
|
|
author={Xu, Gangwei and Wang, Xianqi and Ding, Xiaohuan and Yang, Xin},
|
2023-03-18 10:46:05 +08:00
|
|
|
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
|
2023-03-16 10:08:22 +08:00
|
|
|
year={2023}
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
|
2023-03-12 21:17:44 +08:00
|
|
|
# Acknowledgements
|
|
|
|
|
|
|
|
This project is heavily based on [RAFT-Stereo](https://github.com/princeton-vl/RAFT-Stereo), We thank the original authors for their excellent work.
|
|
|
|
|