Keywords

1 Introduction

Global total energy consumption has reached 162 thousand terawatts in 2019. As the world’s largest economies, China and the United States account for more than 40% of global energy consumption [1]. Among them, the building sector accounts for 18.35% of China’s total energy consumption, and the ratio has reached 40% in the US [2, 3]. In the context of tackling climate change, how to effectively reduce building energy consumption has become an important topic of energy conservation and carbon reduction. Using Building Information Model (BIM), Semantic Web, Internet of Things and other information technologies to improve building operation is one of the key research areas.

BIM contains lots of details since early design stages in buildings. IFC is a common format for BIM. In addition to attribute information, IFC also contains a lot of 3D information, and the material and other information carried by IFC can also be used for rendering 3D effects. The model itself is a large dataset, meaning IFC is not an ideal index to manage the data in the Operation and Maintenance (OM) phase.

The Semantic Web is a network of data that includes dates, titles, part numbers, chemical properties, and any other types of data. Brick and Haystack [4] are two universal semantic schemas for defining entities in buildings. Haystack tags resolve data silos among various subsystems (HVAC, lighting, and enterprise scheduling). Brick aims to standardize the semantic description of buildings, including physical, logical, virtual assets and the relationships between them. Using the Semantic Web, Brick can coherently describe many special and custom functions, assets, and subsystems throughout the building life cycle.

Using Brick or Haystack as a description specification for a building reduces the cost of deploying analytics, energy efficiency measures, and smart controls across buildings, demonstrating the integration of numerous subsystems in a modern building: HVAC, lighting, fire protection, security, etc. Simplifies smart analytics and control applications development, as well as reduces reliance on non-standard, unstructured labels specific to building management systems. But the conversion process is still challenging.

1.1 Related Works

Currently, there are many types of research and applications in building compliance, with Ontology, Metadata, and Semantic Web, but most of them are used for building design and model detection. These studies usually extract semantics from BIM models to generate graphs, or directly use BIM models as query objects, and then carry out deductions such as cost budgeting, energy design, and construction hazard identification based on standards [5,6,7].

Through the integrated application of BIM and Semantic Web technology, the project [8] design conforms to the construction quality specifications. The project can automatically check the size and position of the BIM model components according to the requirements of the specification, thereby reducing the benchmarking workload of the relevant personnel during the construction process.

When extracting semantics from BIM and comparing them with standards, the process usually introduces ontology description technologies such as SPARQL and Web Ontology Language (OWL) or Unified Modeling Language (UML) to improve the standardization and efficiency of retrieving BIM semantics. For instance, McGibbney and Kumar [9] summarized BIM model detection based on semantic implementation. The process firstly converts specifications and models to semantics and then extracts the BIM model based on specification requirements, which improves the efficiency of specification and model checking. Bi et al. [10] used ontology for knowledge expression to promote the protection of ancient buildings.

The current applications of the Semantic Web are similar to labeling entities in Brick and Haystack. Labels produce good guidance for data, but cannot effectively represent relationships between entities. Also, the knowledge graph exported from IFC at this stage contains too much redundant data, which reduces the efficiency of data management. Although the complex types of data in the Operation and Maintenance (OM) phase are more suitable for the Semantic Web to exert its value, few studies have applied it to OM because of the difficulties in creating a semantic web. The application of BIM in the building life cycle is continuous. After accumulating information in BIM, only part of the data will be reused in the OM phase.

However, the applications of BIM are mainly concentrated in the application stages such as model collision check, compliance check, and engineering calculation, and rarely extend to the OM stage, causing massive remodeling works and waste of resources. It is necessary to standardize the repeated semantic description in the model and convert the BIM into a standard semantic model for information indexing, which can reduce model loading, reduce resource consumption and optimize the efficiency of data sharing in the building operating system. Currently, the semantic simplification of BIM models is still done manually, which consumes a lot of manpower and time.

1.2 Contributions

This work explores the scheme of building a standard semantic model based on BIM. First, the entities are extracted from the BIM model, and the names are standardized or transformed regarding the ontology dictionary. Then, the belonging and supply relationships between each entity will be reasoned by combining the knowledge graph. Finally, the custom entities in BIM are converted into standard semantic models and the transition of building information from early stages to OM.

2 Methodology

To create the standard conversion from BIM to Semantic Web, this chapter proposes a method of using NLP to understand entities and infer the interrelationships between entities according to the knowledge graph. The implementation framework of this method is shown in Fig. 1.

Fig. 1.
figure 1

The method of generating standard SW model from BIM

2.1 Entity Extraction

The first step is to extract entities from the data source, such as the BIM model.

When the existing research converts BIM information to the Semantic Web, due to the lack of information, the conversion in Industry Foundation Classes (IFC) is geographical rather than semantical. It cannot infer the relationship between entities when exporting ifcOWL.

There are options to extract entities from the BIM model, directly extracted through plug-ins, or converted into ifcOWL. The latter will cause information loss in the conversion process. In contrast, directly extracting entities from the model can avoid data loss during the conversion.

Direct data extraction from the BIM model can use plug-ins such as Dynamo or Grasshopper, as shown in Fig. 2. First, the script traverses all the categories in the file (such as piping, mechanical, electrical, etc.). Then, it traverses the entities under each category. Finally, the program returns all the entity names and family categories. The built-in conditional grouping module of Dynamo can group entities according to their spatial positions. After deduplication, entities and their positional relationships can be obtained.

Fig. 2.
figure 2

Extracting entities based on location relationships from BIM

As illustrated in Fig. 3, following similar settings, the extraction of entities can also be carried out according to the supply relationship. The script uses the third-party plug-in (Spring Node) to obtain the associated entities within the view. By setting the category of the parent equipment and the category of the equipment being supplied respectively, and removing the repeating entities, the supply relationships can be obtained.

Fig. 3.
figure 3

Extracting entities based on feeding relationships from BIM

2.2 Text Transformation and Relationship Inference

The first step is to determine the entity name. If the names of the extracted entities are irregular, NLP is needed to understand its family name through cosine similarity, as shown in Fig. 4. Knowledge graphs use visualization techniques to describe knowledge resources, which can be used to display, analyze, and reason about the interconnections between entities. The relationship can be derived according to the knowledge graph. The translated standard entity can infer the relationships according to the location in the knowledge graph. The implementation of relational reasoning mainly relies on the SPARQL query on the complete knowledge graph and creates relationships for newly generated entities according to the query results.

Fig. 4.
figure 4

The reasoning of relationships from custom entity names

Ai and Bi are the vectors transformed from the standard dictionary of name and ontology respectively, and the cosine similarity of the vectors is calculated in turn, as shown in Eq. 1. The closer the cosine similarity is to 1, the closer the two words are to each other, −1 means the words are opposite.

$${\text{similarity}} = \frac{\sum_{\text{i=1}}^{n}{A}_{i}{B}_{i}}{\sqrt{\sum_{\text{i=1}}^{n}{A}_{i}^{2}}\sqrt{\sum_{\text{i=1}}^{n}{B}_{i}^{2}}}$$
(1)

Through the cosine similarity evaluation of the vector, the entity name can be transformed into the standard vocabulary of the ontology dictionary. The framework uses tools PyTorch and transformers [11]. BERT converts the text into entries with a length of 128. Each entry has a separate 768-digit vector. Pooling will extract the average of all tags and combine them into a unique 768 vector space to produce a “sentence vector”. Based on the pre-trained models, the program converts the non-standard vocabulary and ontology vocabulary into vectors respectively, uses PyTorch to calculate the cosine similarity, and finally selects the closest standard word.

If the naming rules of the model are clear, such as RM_101_IDU_134_T1 refers to No. 134 indoor unit in Room 101, it only needs to be converted into the standard name of the body according to the naming rules. The naming rules may already include the location and supply relationships, and the relationship between entities can be directly disassembled from the naming. Figure 5 shows the process.

Fig. 5.
figure 5

The reasoning of entity relationships from formatted entity names

By comparing the inferred relationships with a complete knowledge graph (such as the knowledge graph of an air-conditioning water-cooling system), the interrelationships between entities can be deduced, and a standardized semantic model can be generated.

3 Experiment Results and Discussions

An experiment was conducted on the HVAC file within Revit. The program extracts examples from BIM files and classifies them into 42 categories. 25% of the categories, which are the data sources and control units, are included in Brick or Haystack. The classification of examples is carried out according to ontology standards, and the results are shown in Table 1. The extracted examples are mainly HVAC and electrical equipment. The entity names may not be standardized, for instance, Trck_BswySystms_Cooper_RSA_Profile Series_AR111 Closed Back Integral Xfmr is a lighting device.

Table 1. The words covered in Brick and Haystack

Figure 6 shows the comparison results between the pre-trained language model and brick and haystack standards. The SentenceTransformer loads the pretrained models [12] and calculates the most similar word, respectively. From the results that even the model specially adjusted for similarity calculation has poor results. The accuracy of the first mock exam is only 60%, and the same model is not stable when dealing with different ontology standards. Paraphrase-mpnet-base-v2, which performs best in classifying and looking for similar texts, has a 40% difference in accuracy between brick standard and haystack standard. However, this problem will be improved with the strengthening of buildings and equipment by the pretrained model, which is also confirmed by the results of different pretrained models.

Fig. 6.
figure 6

The results of corrections

Compared with the results of brick vocabulary calculation, although the natural language model has been optimized in their respective training sets, its performance is still unstable in the process of practical application, especially in the fields involving specific professional knowledge.

For non-standard named entities, semantic understanding is required. Due to the ontology standards’ low coverage of the BIM model and the fact that the pretrained model has not been strengthened for buildings and equipment, the accuracy of standardized text translation is low, and the automation level in the process of generating standardized semantic web from non-standard named entities in BIM file is low, so manual participation is still needed at present.

When using NLP to understand professional knowledge, it still needs to be constrained by a large amount of relevant professional knowledge. For example, when this chapter explores the use of NLP to transform the non-standard language description in BIM into the ontology standard in the field of HVAC, the implementation of this process requires HVAC professional knowledge to provide constraints for machine-reading and understanding and optimize the training set.

However, the pretrained model has not been optimized enough, and there is a large deviation between the results of machine understanding and the actual situation. The accuracy of BIM + NLP in converting non-standard entity names needs to be improved, and the current model also needs to strengthen the language recognition of professional vocabulary in the field of architecture and HVAC.

4 Conclusions

Based on BIM Technology, this work studies the method of standard semantic description of BIM entities, including the extraction of entities and their relationships from BIM and the entity name transformation of NLP semantic understanding.

Currently, the standardization of BIM entity naming rules is an effective means. Due to the low accuracy in understanding professional HVAC knowledge, the conversion of BIM entities without clear naming rules is still dominated by manual work, and automation is limited. In future research, optimizing the pretrained model through professional knowledge will be the key to optimizing the NLP translation results. With the assistance of ERNIE 3.0 or GPT-3, semantic understanding should be more accurate during BIM to semantic web conversions.