IGEV/README.md

# IGEV-Stereo & IGEV-MVS

This repository contains the source code for our paper:

[Iterative Geometry Encoding Volume for Stereo Matching](https://arxiv.org/pdf/2303.06615.pdf)<br/>
CVPR 2023 <br/>
Gangwei Xu, Xianqi Wang, Xiaohuan Ding, Xin Yang<br/>

<img src="IGEV-Stereo/IGEV-Stereo.png">

## Demos
Pretrained models can be downloaded from [google drive](https://drive.google.com/drive/folders/1SsMHRyN7808jDViMN1sKz1Nx-71JxUuz?usp=share_link)

We assume the downloaded pretrained weights are located under the pretrained_models directory.

You can demo a trained model on pairs of images. To predict stereo for Middlebury, run
```
python demo.py --restore_ckpt ./pretrained_models/sceneflow/sceneflow.pth
```

<img src="IGEV-Stereo/demo-imgs.png" width="90%">

## Comparison with RAFT-Stereo

| Method | KITTI 2012 <br> (3-noc) | KITTI 2015 <br> (D1-all) | Memory (G) | Runtime (s) |
|:-:|:-:|:-:|:-:|:-:|
| RAFT-Stereo | 1.30 % | 1.82 % | 1.02 | 0.38 |
| IGEV-Stereo | 1.12 % | 1.59 % | 0.66 | 0.18 |


## Environment
* NVIDIA RTX 3090
* Python 3.8
* Pytorch 1.12

### Create a virtual environment and activate it.

```
conda create -n IGEV_Stereo python=3.8
conda activate IGEV_Stereo
```
### Dependencies

```
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch -c nvidia
pip install opencv-python
pip install scikit-image
pip install tensorboard
pip install matplotlib 
pip install tqdm
pip install timm==0.5.4
```

## Required Data
To evaluate/train IGEV-Stereo, you will need to download the required datasets. 
* [Scene Flow](https://lmb.informatik.uni-freiburg.de/resources/datasets/SceneFlowDatasets.en.html)
* [KITTI](http://www.cvlibs.net/datasets/kitti/eval_scene_flow.php?benchmark=stereo)
* [Middlebury](https://vision.middlebury.edu/stereo/data/)
* [ETH3D](https://www.eth3d.net/datasets#low-res-two-view-test-data)

By default `stereo_datasets.py` will search for the datasets in these locations. 

```
├── /data
    ├── sceneflow
        ├── frames_finalpass
        ├── disparity
    ├── KITTI
        ├── KITTI_2012
            ├── training
            ├── testing
            ├── vkitti
        ├── KITTI_2015
            ├── training
            ├── testing
            ├── vkitti
    ├── Middlebury
        ├── trainingH
        ├── trainingH_GT
    ├── ETH3D
        ├── two_view_training
        ├── two_view_training_gt
    ├── DTU_data
        ├── dtu_train
        ├── dtu_test
```

## Evaluation

To evaluate on Scene Flow or Middlebury or ETH3D, run

```Shell
python evaluate_stereo.py --restore_ckpt ./pretrained_models/sceneflow/sceneflow.pth --dataset sceneflow
```
or
```Shell
python evaluate_stereo.py --restore_ckpt ./pretrained_models/sceneflow/sceneflow.pth --dataset middlebury_H
```
or
```Shell
python evaluate_stereo.py --restore_ckpt ./pretrained_models/sceneflow/sceneflow.pth --dataset eth3d
```

## Training

To train on Scene Flow, run

```Shell
python train_stereo.py
```

To train on KITTI, run
```Shell
python train_stereo.py --restore_ckpt ./pretrained_models/sceneflow/sceneflow.pth --dataset kitti
```

## Submission

For submission to the KITTI benchmark, run
```Shell
python save_disp.py
```

## MVS training and evaluation

To train on DTU, run

```Shell
python train_mvs.py
```

To evaluate on DTU, run

```Shell
python evaluate_mvs.py
```

## Citation

If you find our work useful in your research, please consider citing our paper:

```bibtex
@inproceedings{xu2023iterative,
  title={Iterative Geometry Encoding Volume for Stereo Matching},
  author={Xu, Gangwei and Wang, Xianqi and Ding, Xiaohuan and Yang, Xin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2023}
}
```


# Acknowledgements

This project is heavily based on [RAFT-Stereo](https://github.com/princeton-vl/RAFT-Stereo), We thank the original authors for their excellent work.
Update README.md 2022-10-17 00:06:39 +08:00			`# IGEV-Stereo & IGEV-MVS`
Update README.md 2023-03-13 14:41:09 +08:00
Update README.md 2023-03-12 20:28:25 +08:00			`This repository contains the source code for our paper:`
Update README.md 2022-08-06 15:50:37 +08:00
Update README.md 2023-03-14 09:05:14 +08:00			`[Iterative Geometry Encoding Volume for Stereo Matching](https://arxiv.org/pdf/2303.06615.pdf)<br/>`
Update README.md 2023-03-12 20:28:25 +08:00			`CVPR 2023 <br/>`
			`Gangwei Xu, Xianqi Wang, Xiaohuan Ding, Xin Yang<br/>`
Update README.md 2022-08-06 15:50:37 +08:00
Update README.md 2023-03-12 20:37:24 +08:00			`<img src="IGEV-Stereo/IGEV-Stereo.png">`

Update README.md 2023-03-12 22:10:49 +08:00			`## Demos`
			`Pretrained models can be downloaded from [google drive](https://drive.google.com/drive/folders/1SsMHRyN7808jDViMN1sKz1Nx-71JxUuz?usp=share_link)`

Update README.md 2023-03-30 16:21:27 +08:00			`We assume the downloaded pretrained weights are located under the pretrained_models directory.`

Update README.md 2023-03-12 22:10:49 +08:00			`You can demo a trained model on pairs of images. To predict stereo for Middlebury, run`
			```
			`python demo.py --restore_ckpt ./pretrained_models/sceneflow/sceneflow.pth`
			```
Update README.md 2023-03-12 22:12:06 +08:00
			`<img src="IGEV-Stereo/demo-imgs.png" width="90%">`
Update README.md 2023-03-12 22:10:49 +08:00
Update README.md 2023-03-23 11:18:17 +08:00			`## Comparison with RAFT-Stereo`

			`\| Method \| KITTI 2012 <br> (3-noc) \| KITTI 2015 <br> (D1-all) \| Memory (G) \| Runtime (s) \|`
			`\|:-:\|:-:\|:-:\|:-:\|:-:\|`
			`\| RAFT-Stereo \| 1.30 % \| 1.82 % \| 1.02 \| 0.38 \|`
			`\| IGEV-Stereo \| 1.12 % \| 1.59 % \| 0.66 \| 0.18 \|`


Update README.md 2022-08-06 15:50:37 +08:00			`## Environment`
Update README.md 2023-03-12 20:28:25 +08:00			`* NVIDIA RTX 3090`
Update README.md 2022-08-06 15:50:37 +08:00			`* Python 3.8`
			`* Pytorch 1.12`

			`### Create a virtual environment and activate it.`

			```
Update README.md 2022-09-27 14:45:53 +08:00			`conda create -n IGEV_Stereo python=3.8`
			`conda activate IGEV_Stereo`
Update README.md 2022-08-06 15:50:37 +08:00			```
			`### Dependencies`

			```
			`conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch -c nvidia`
			`pip install opencv-python`
			`pip install scikit-image`
			`pip install tensorboard`
			`pip install matplotlib`
			`pip install tqdm`
			`pip install timm==0.5.4`
			```
Update README.md 2023-03-12 20:49:36 +08:00
Update README.md 2023-03-12 21:17:44 +08:00			`## Required Data`
			`To evaluate/train IGEV-Stereo, you will need to download the required datasets.`
Update README.md 2023-03-12 21:19:36 +08:00			`* [Scene Flow](https://lmb.informatik.uni-freiburg.de/resources/datasets/SceneFlowDatasets.en.html)`
Update README.md 2023-03-12 21:17:44 +08:00			`* [KITTI](http://www.cvlibs.net/datasets/kitti/eval_scene_flow.php?benchmark=stereo)`
			`* [Middlebury](https://vision.middlebury.edu/stereo/data/)`
			`* [ETH3D](https://www.eth3d.net/datasets#low-res-two-view-test-data)`

			By default `stereo_datasets.py` will search for the datasets in these locations.

			```
			`├── /data`
			`├── sceneflow`
			`├── frames_finalpass`
			`├── disparity`
			`├── KITTI`
			`├── KITTI_2012`
			`├── training`
			`├── testing`
			`├── vkitti`
			`├── KITTI_2015`
			`├── training`
			`├── testing`
			`├── vkitti`
			`├── Middlebury`
			`├── trainingH`
			`├── trainingH_GT`
			`├── ETH3D`
			`├── two_view_training`
			`├── two_view_training_gt`
Update README.md 2023-03-20 20:02:42 +08:00			`├── DTU_data`
			`├── dtu_train`
			`├── dtu_test`
Update README.md 2023-03-12 21:17:44 +08:00			```

			`## Evaluation`

Update README.md 2023-03-20 20:06:51 +08:00			`To evaluate on Scene Flow or Middlebury or ETH3D, run`

Update README.md 2023-03-12 21:17:44 +08:00			```Shell
			`python evaluate_stereo.py --restore_ckpt ./pretrained_models/sceneflow/sceneflow.pth --dataset sceneflow`
			```
Update README.md 2023-03-20 20:06:51 +08:00			`or`
			```Shell
			`python evaluate_stereo.py --restore_ckpt ./pretrained_models/sceneflow/sceneflow.pth --dataset middlebury_H`
			```
			`or`
			```Shell
			`python evaluate_stereo.py --restore_ckpt ./pretrained_models/sceneflow/sceneflow.pth --dataset eth3d`
			```
Update README.md 2023-03-12 21:17:44 +08:00
			`## Training`

			`To train on Scene Flow, run`

			```Shell
			`python train_stereo.py`
			```

			`To train on KITTI, run`
			```Shell
			`python train_stereo.py --restore_ckpt ./pretrained_models/sceneflow/sceneflow.pth --dataset kitti`
			```

			`## Submission`

			`For submission to the KITTI benchmark, run`
			```Shell
			`python save_disp.py`
			```

Update README.md 2023-03-20 20:02:42 +08:00			`## MVS training and evaluation`

			`To train on DTU, run`

			```Shell
			`python train_mvs.py`
			```

			`To evaluate on DTU, run`

			```Shell
			`python evaluate_mvs.py`
			```

Update README.md 2023-03-16 10:08:22 +08:00			`## Citation`

			`If you find our work useful in your research, please consider citing our paper:`

Update README.md 2023-03-18 10:46:05 +08:00			```bibtex
			`@inproceedings{xu2023iterative,`
Update README.md 2023-03-16 10:08:22 +08:00			`title={Iterative Geometry Encoding Volume for Stereo Matching},`
			`author={Xu, Gangwei and Wang, Xianqi and Ding, Xiaohuan and Yang, Xin},`
Update README.md 2023-03-18 10:46:05 +08:00			`booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},`
Update README.md 2023-03-16 10:08:22 +08:00			`year={2023}`
			`}`
			```


Update README.md 2023-03-12 21:17:44 +08:00			`# Acknowledgements`

			`This project is heavily based on [RAFT-Stereo](https://github.com/princeton-vl/RAFT-Stereo), We thank the original authors for their excellent work.`