摘要Abstract
建筑信息建模(BIM)已成为建筑、工程和施工(AEC)行业的重要数字平台,但目前缺乏大规模、类别均衡的BIM构件数据集,限制了AI驱动的分析研究。我们提出BIMCompNet,一个基于IFC标准的多模态数据集,支持从渲染视图、点云、网格结构和语义图等多种几何表示中学习。
BIMCompNet通过标准化的两阶段流程构建: (1) 模型级:单位归一化、IFC格式转换、元数据脱敏、构件自动拆分; (2) 构件级:语义标签修正、几何与定位对齐、模型及项目级去重,并生成五种同步模态(OBJ、多视图图像、点云、体素、异构图)。
数据集包含130万个标注构件,覆盖87类IFC类别,来源于1,607个真实BIM模型,涉及14种建筑类型。针对类别不均衡问题,我们合并稀有类别、下采样主导类别,生成用于稳健模型训练的均衡子集。我们在多种模态与模型上进行了分类基线实验。数据集及其处理流程将公开,支持复现与私有扩展。
Building Information Modeling (BIM) has emerged as a crucial digital platform in the AEC industry. Yet, the lack of large-scale, class-balanced BIM component datasets hinders AI-driven analysis. We introduce BIMCompNet, a multimodal dataset derived from Industry Foundation Classes (IFC), enabling learning from diverse geometric representations including rendered views, point clouds, mesh structures, and semantic graphs.
BIMCompNet is built via a standardized two-stage pipeline: (1) Model level: normalize geometry to SI units, convert to IFC, anonymize metadata, and extract components into individual IFC files. (2) Component level: correct semantic labels, align geometry and positioning, remove duplicates across models and projects, and generate five synchronized modalities (OBJ, multi-view images, point clouds, voxels, and graphs).
The dataset contains 1.3 million labeled components across 87 IFC classes, sourced from 1,607 real-world BIM models spanning 14 building types. To address class imbalance, rare classes are merged and dominant ones are downsampled, forming balanced subsets for robust model training. Benchmark experiments are conducted across multiple modalities and models. Both the dataset and full processing pipeline will be publicly released to support reproducibility and extension.
数据集下载Dataset Download
完整版数据集Full Dataset
采样版数据集Sampled Dataset
模型检查点Model Checkpoints
引用BibTeX
如果您在研究中使用了BIMCompNet,请引用我们的工作:
If you find BIMCompNet useful in your research, please cite our work:
@inproceedings{BIMCompNet2025,
author = {Yang, Mingsong and Hei, Xinhong and Chen, Kehai and Meng, Haining and Dong, HaoYang and Zhao, Qin},
title = {BIMCompNet: Multimodal Dataset for Geometric Deep Learning in Building Information Model},
year = {2025},
isbn = {9798400720352},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3746027.3758238},
doi = {10.1145/3746027.3758238},
booktitle = {Proceedings of the 33rd ACM International Conference on Multimedia},
pages = {12927–12933},
numpages = {7}
}
@article{IFCGeoNet2026,
author = {Mingsong Yang and Xinhong Hei and Haining Meng and Kehai Chen and Xinyu Tong and YuChao Li and Qin Zhao},
title = {Lightweight framework for IFC element classification using multi-view and heterogeneous graph with decision-level fusion},
journal = {Automation in Construction},
volume = {181},
pages = {106660},
year = {2026},
issn = {0926-5805},
doi = {https://doi.org/10.1016/j.autcon.2025.106660}
}
许可证License
我们的数据集使用 MIT许可证
Our dataset is under MIT License
联系方式Contact
如有任何问题,请联系我们:yamiso@163.com
Please contact us at yamiso@163.com if you have any questions