Training RoomFormer on Satellite Imagery (TIFF & GeoJSON Masks) #28

spaceie08 · 2025-02-06T13:07:14Z

Hello,

I’ve been exploring RoomFormer and am very impressed with its approach to floorplan reconstruction. I’m interested in applying the model to a different type of 2D data. I have a dataset consisting of satellite images (in TIFF format) along with corresponding segmentation masks provided as GeoJSON files.

Could you please advise on the following:

Data Preprocessing: How should I adapt the current data pipeline to handle TIFF images and GeoJSON mask annotations? Are there any recommended modifications to the preprocessing steps (e.g., converting TIFF images or handling GeoJSON data) to fit the input requirements of RoomFormer?
Model Adjustments: Do I need to make any changes to the model architecture or training scripts to work effectively with satellite imagery, which may have different resolution or content characteristics compared to the original dataset?
Training Considerations: Are there specific best practices or potential pitfalls you’d recommend when adapting RoomFormer for satellite imagery tasks?
Any guidance or pointers to additional resources would be greatly appreciated. Thank you for your time and for developing such an exciting model!

ywyue · 2025-02-09T20:44:29Z

Hi @spaceie08, thanks for your interest and kind words!

Data Preprocessing:
- Input: The original RoomFormer was designed to take a density map (256×256×1) as input. Therefore, you need to adjust the input channel of the first layer to make it compatible with your TIFF images:
  
  RoomFormer/models/backbone.py
  
  Line 101 in 33ad087
  
  backbone.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)
  
  In addition, you may consider making the input images the same size (a local crop of the TIFF images), because it makes less sense to feed the entire TIFF images as input, which are large and differ in sizes/resolutions.
- Annotation: The original dataloader of RoomFormer requires coco format annotations, which store the annotation for polygons in each image as a list of lists (i.e. a sequence of ordered vertices). If you only have segmentation masks annotation, then you need to generate polygonal annotations. Once you have those polygons, you need to convert those annotations to coco format. Please check our preprocessing code: https://github.com/ywyue/RoomFormer/blob/main/data_preprocess/stru3d/generate_coco_stru3d.py
  One sidenote: if you don't want to generate polygonal annotation, you may try train the model using only the segmentation mask annotation. But it requires some adaption of the training and model, e.g. keeping fixed number of vertices in each polygon, removing the vertex coordinate regression loss.
Model Adjustments: I assume you want to reconstruct building boundary from satellite imagery. If that is the case and you do the preprocessing correctly, there are somethings you need to modify:
- Please make sure to set --num_polys >= maximum of building numbers in an image and --num_queries >= --num_polys × maximum of corner numbers in a building.
- In satellite imagery, there maybe partially occluded buildings, usually around image boundary. You may considering masking out those parts in the loss computation.
Training Considerations: After data prepocessing and model adjustments, I think the good starting point is to try to overfit the model on a single training image to see whether the model work. If everything is fine, the model should give (almost) perfect prediction. Otherwise, it makes no sense to scale the training on the whole dataset.

There is another similar issue regarding extending RoomFormer for outdoor building outline extraction from satellite imagery, which you may find helpful: #22

Hope the answers are helpful!

spaceie08 · 2025-02-20T20:01:16Z

Hi,

Thank you so much for your guidance and suggestions! I’ve implemented the changes you recommended and wanted to update you on my progress.

Data Preprocessing:
- I adjusted the first convolution layer to accept 3-channel inputs instead of a single channel:
```
backbone.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
```
- I converted the dataset annotations into the COCO format, ensuring that polygonal annotations are correctly handled.

Model Adjustments:

Registered the new dataset format ("fields") within the dataset loader:

if args.dataset_name == 'stru3d' or args.dataset_name == 'scenecad' or args.dataset_name == 'fields':
    return build_poly(image_set, args)

Updated the evaluator to handle the "fields" dataset with a custom S3DRW evaluator:

elif dataset_name == 'fields':
    curr_opts = copy.deepcopy(opts)
    curr_opts.scene_id = "scene_0" + str(scene_ids[i])
    curr_data_rw = S3DRW(curr_opts, mode="train")
    evaluator = Evaluator(curr_data_rw, curr_opts)

Adjusted image dimensions and channel order with:

record['image'] = (1/255) * torch.as_tensor(np.ascontiguousarray(np.transpose(img, (2, 0, 1))))

Training Considerations:
- Despite implementing these adjustments, the results so far have not been satisfactory. The model is struggling to produce meaningful predictions.

I’ve attached the git diff file that captures all the changes I made, including the modifications in datasets/__init__.py, poly_data.py, engine.py, backbone.py, and evaluation adjustments for the "fields" dataset. There were also binary changes in native_rasterizer and CUDA-related files, which may or may not impact the performance — perhaps I missed something crucial.

Could you please take a look and confirm if these modifications align with your suggestions? I’d appreciate your thoughts on whether the changes I made are correct and if there’s anything I should adjust.

Also, you mentioned a similar issue (#22) where you trained RoomFormer on a building dataset and shared the code with the author of that issue. I’d be really grateful if you could share that code with me as well. Please feel free to send it to my email at [email protected]

diff.log

Looking forward to your feedback and any advice you may have to help improve the model's performance!

Best regards,
Jazib

ywyue · 2025-02-28T17:29:57Z

Hi @spaceie08, I have sent you the code that adapts RoomFormer on an outdoor dataset CrowdAI Mapping.

I also checked the diff log you uploaded here. It is hard to tell whether everything is correct by justing looking this log. I have some suggestions that you could check:

Visualize/plot your ground truth and check whether it makes sense.
How does the wandb training/validation curve look like?
Did you manage to overfit the network on a single data sample?

Just5D · 2025-03-26T03:29:18Z

Could you share the code that adapts RoomFormer to the outdoor dataset CrowdAI Mapping? THX

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training RoomFormer on Satellite Imagery (TIFF & GeoJSON Masks) #28

Training RoomFormer on Satellite Imagery (TIFF & GeoJSON Masks) #28

spaceie08 commented Feb 6, 2025

ywyue commented Feb 9, 2025 •

edited

Loading

spaceie08 commented Feb 20, 2025

ywyue commented Feb 28, 2025 •

edited

Loading

Just5D commented Mar 26, 2025

Training RoomFormer on Satellite Imagery (TIFF & GeoJSON Masks) #28

Training RoomFormer on Satellite Imagery (TIFF & GeoJSON Masks) #28

Comments

spaceie08 commented Feb 6, 2025

ywyue commented Feb 9, 2025 • edited Loading

spaceie08 commented Feb 20, 2025

ywyue commented Feb 28, 2025 • edited Loading

Just5D commented Mar 26, 2025

ywyue commented Feb 9, 2025 •

edited

Loading

ywyue commented Feb 28, 2025 •

edited

Loading