It is common practice for data providers to include text descriptions for each column when publishing data sets in the form of data dictionaries.While these documents are useful in helping an end-user properly interpr...It is common practice for data providers to include text descriptions for each column when publishing data sets in the form of data dictionaries.While these documents are useful in helping an end-user properly interpret the meaning of a column in a data set,existing data dictionaries typically are not machine-readable and do not follow a common specification standard.We introduce the Semantic Data Dictionary,a specification that formalizes the assignment of a semantic representation of data,enabling standardization and harmonization across diverse data sets.In this paper,we present our Semantic Data Dictionary work in the context of our work with biomedical data;however,the approach can and has been used in a wide range of domains.The rendition of data in this form helps promote improved discovery,interoperability,reuse,traceability,and reproducibility.We present the associated research and describe how the Semantic Data Dictionary can help address existing limitations in the related literature.We discuss our approach,present an example by annotating portions of the publicly available National Health and Nutrition Examination Survey data set,present modeling challenges,and describe the use of this approach in sponsored research,including our work on a large National Institutes of Health(NIH)-funded exposure and health data portal and in the RPI-IBM collaborative Health Empowerment by Analytics,Learning,and Semantics project.展开更多
The Ontology registry system is developed to collect, manage, and compare ontological information for integrating global observation data. Data sharing and data service such as support of metadata deign, structuring o...The Ontology registry system is developed to collect, manage, and compare ontological information for integrating global observation data. Data sharing and data service such as support of metadata deign, structuring of data contents, support of text mining are applied for better use of data as data interoperability. Semantic network dictionary and gazetteers are constructed as a trans-disciplinary dictionary. Ontological information is added to the system by digitalizing text based dictionaries, developing 'knowledge writing tool' for experts, and extracting semantic relations from authoritative documents with natural language processing technique. The system is developed to collect lexicographic ontology and geographic ontology.展开更多
基金This work is supported by the National Institute of Environmental Health Sciences(NIEHS)Award 0255-0236-4609/1U2CES026555-01IBM Research AI through the AI Horizons Network,and the CAPES Foundation Senior Internship Program Award 88881.120772/2016-01.
文摘It is common practice for data providers to include text descriptions for each column when publishing data sets in the form of data dictionaries.While these documents are useful in helping an end-user properly interpret the meaning of a column in a data set,existing data dictionaries typically are not machine-readable and do not follow a common specification standard.We introduce the Semantic Data Dictionary,a specification that formalizes the assignment of a semantic representation of data,enabling standardization and harmonization across diverse data sets.In this paper,we present our Semantic Data Dictionary work in the context of our work with biomedical data;however,the approach can and has been used in a wide range of domains.The rendition of data in this form helps promote improved discovery,interoperability,reuse,traceability,and reproducibility.We present the associated research and describe how the Semantic Data Dictionary can help address existing limitations in the related literature.We discuss our approach,present an example by annotating portions of the publicly available National Health and Nutrition Examination Survey data set,present modeling challenges,and describe the use of this approach in sponsored research,including our work on a large National Institutes of Health(NIH)-funded exposure and health data portal and in the RPI-IBM collaborative Health Empowerment by Analytics,Learning,and Semantics project.
基金the Data Integration and Analysis System (DIAS) Project
文摘The Ontology registry system is developed to collect, manage, and compare ontological information for integrating global observation data. Data sharing and data service such as support of metadata deign, structuring of data contents, support of text mining are applied for better use of data as data interoperability. Semantic network dictionary and gazetteers are constructed as a trans-disciplinary dictionary. Ontological information is added to the system by digitalizing text based dictionaries, developing 'knowledge writing tool' for experts, and extracting semantic relations from authoritative documents with natural language processing technique. The system is developed to collect lexicographic ontology and geographic ontology.