Support

Support Options

Report a problem

Research

The Materials Data Engineering (MaDE) laboratory is a collaborative effort between the Materials Design and Innovation (MDI) department, the Center for Unified Biometrics and Sensors (CUBS) and the Center for Computational Research (CCR) at the University at Buffalo (UB). It is funded by the National Science Foundation (NSF) as part of the Data Infrastructure Building Blocks (DIBBs) program within the Cyberinfrastructure Framework for 21st Century Science and Engineering (CIF21) initiative.

Our mission is to develop a machine learning framework for materials scientists to accelerate the discovery of advanced materials. This framework would comprise of several synergistic building blocks that enable materials scientists to quickly design and develop, more efficient materials with the help of materials data, machine learning models and high performance computing resources. These building blocks include tools for document and image processing to extract experimental information from handbooks, journals etc.; utilizing existing materials data in several online repositories; applying different machine learning models to find unique insights; comparing their performance in terms of predicting properties; visualization of results to better understand physical phenomena.

  • Machine Learning

    Computational methods in material sciences have been gaining prominence in recent years, and have been experimentally verified for several materials of great practical importance. Computations often rely on first-principles based methods and become expensive as the chemistry of the material becomes complex. We wish to investigate the use of machine learning methods to take advantage of already computed values in order to reduce the computational cost of first principles based methods without compromising on accuracy. We aim to provide a toolkit that allows a researcher to try several 'fingerprint' extraction and machine learning algorithms in a 'plug-and-play' fashion on their data. Since there is a large number of open-source materials databases and hence, access to large amounts of materials data, such a toolkit would be a valuable resource for research and education purposes. Additionally, by providing access to different kinds of descriptors, this toolkit will enable a broader research audience to apply ML algorithms to a wide variety of materials science data than is currently possible.

  • Document Analysis

    Graphs and charts are a compact visual representation of data, results and conclusions of research and form a key component of the published literature. Graphs and scientific plots are mostly ignored by most popular search engines and have only received moderated attention in the recent years with lack of working applications that can extract relevant information from them in a fully automated manner. Under this grant, we have tackled this problem via two approaches 'top-down' (processing of charts in general by understanding common sub-elements of graphs like text, legend axis) and 'bottom-up' (processing of specialized domain-specific charts like phase diagrams). Apart from scientific charts, we also focus on other non-traditional sources of information such as lecture and presentation videos where we attempt to extract text, mathematical expressions and figures from slides or whiteboard content within the video to empower downstream recognition and search applications.

  • Visualization

    Data accessed from documents or even models and results obtained from machine prediction are of little use to a researcher if they are not presented in a clear and concise manner. We will work with the materials community to design a suitable interface where the data and results from tools can be prepared and presented in a suitable fashion to enable easy sharing, publications and collaborations.