Support

Support Options

Report a problem

Domain-specific Chart Processing

In this approach we used the known contextual knowledge to synthesis the information from the diagrams. Specialised rules may be required in such cases to interpret the diagrams. Phase Diagrams are an important type of diagrams in materials science literature. We laid the groundwork for a large-scale, indexable, digitized database of phase diagrams with different thermodynamic conditions and compositions for a wide variety of materials.

Phase diagrams serve as a mapping of phase stability in the context of extrinsic variables such as chemical composition with respect to temperature and/or pressure and therefore provide the equilibrium phase compositions and ratios under variable thermodynamic conditions. The geometrical characteristics of phase diagrams, including the shape of phase boundaries and positions of phase boundary junctions have fundamental thermodynamic origins. Hence they serve as a visual signature of the nature of thermo-chemical properties of alloys. The design of alloys for instance, relies on inspection of many such documented phase diagrams and this is usually a manual process.

We concentrated on approximately 80 thermodynamic phase diagrams of binary metallic alloy systems which give phase information of multi-component systems at varied temperatures and mixture ratios. We used image processing techniques to isolate phase boundaries and subsequently extract areas of the same phase. Simultaneously, document analysis techniques were employed to recognize and group the text used to label the phases; text present along the axes was identified so as to map image coordinates to physical coordinates. Phases of unlabeled regions were inferred using standard rules.

Our objective was to develop an automated document recognition tool that can process large quantities of phase diagrams in order to support user queries which, in turn, facilitate the simultaneous screening of a large number of materials without loss of information. From a document analysis perspective, a phase diagram can be seen to consist mainly of alphanumeric text, often with accompanying Greek characters, in vertical and horizontal orientations; bounded regions of uniform phase within the plot; and descriptions of axes and numerical quantities along the axes.
 

Fig. Example of a challenging phase diagram.

As can be seen in Figure, narrow and small phase regions, presence of arrows, text located very close to phase boundaries and different orientations pose steep challenges to the automated analysis. The key steps in automated phase diagram analysis are detailed below.

Preprocessing: Every image is preprocessed by binarization using Otsu's algorithm and then making sure that the background pixels are off and foreground pixels are on. Contours are then obtained from the binarized image using a border following algorithm. These contours correspond to either boundaries of uniform phase regions or the text elements in the graph.

Contour Classification: Geometric and shape based features suitable for shape classification [1] are extracted from the contours and a model is trained to distinguish between text and phase region contours. Training data was collected by manually annotating the phase diagram plot contours using an in-house tool. The model is then used to separate the extracted contours into phase and text.

Grouping text: Contour extraction algorithm extracts single or groups of characters from text regions depending on quality of input graph images. We group these segmented characters into meaningful units such as words, formula. This is carried out by analyzing the gaps and relative widths of the text contours using the procedure detailed in our work[1]. Our procedure inherently also sorts the text words into horizontal and vertical orientations using the recognition confidence of that word.

Parsing axes: Axes are determined using the Hough transform to detect lines. Text contours outside of the axes are recognized to define the X-axis and Y-axis ranges, as well as quantities of the current phase diagram. The X-axis label also gives us the particular binary system being described in the graph. This lets us translate image coordinates (x,y) into physical coordinates described in the phase diagram - in our case, (mole fraction, temperature).

Detection of arrows: Arrows are generally used in graphs to avoid clutter and label hard to reach regions. Since our goal is to match the text labels and phase regions, detection of arrows if any is critical. Further in our dataset, we observed arrows with length ranging from tens to hundreds of pixels. We use Hough line transform to detect short straight line segments and use an algorithm to merge collinear and overlapping segments to isolate arrows in the image. We determine the tail and head of arrow using a simple centroid measurement to establish the direction of every arrow.

Matching labels and regions: We first associate text label regions to phase regions by comparing distance between centroids. Two text labels associated with the same region are marked as conflicting. Conflicting labels are resolved by checking for the presence of arrows as well as checking if they are vertical and looking for vertical lines nearby. Once all conflicts are resolved, regions with no phase labeling are inferred using neighborhood information as well as the standard phase diagram interpretation rules.

After this procedure is completed, for every phase diagram, we get a list of phase region labels (both direct and inferred) and the corresponding uniform phase regions. These phase regions are described by their contour or boundary (a list of image coordinates). We also have axes text which lets us translate these image coordinates into physical coordinates, and tells us the elements in the binary system.

Once a phase diagram is thus digitized we are able to provide the phase of all materials present in our database at any given temperature and alloy mixture ratio. Using the digitized data, more complex queries may also be supported in the future. We evaluated our system by measuring the correctness of labeling of phase regions and obtain an accuracy of about 94%.

Further, from the phase diagram images, we readily identify specific types of phase boundary junctions, known as `eutectic points'. Since we can query and obtain the liquid phase region of every system in our database, we can then analyze the contour to detect points of interest. We have used this test case to show that we can characterize the shape of eutectic points, and provide a meaning to the term ‘deep eutectic’. Deep eutectics are known to be critical for the formation of metallic glasses (i.e. metallic systems without crystalline order), although previously no clear meaning of deep eutectic had been defined in terms of identifying new compounds [2]. Our work was then used to detect these eutectic points and angles on the contour graphs using the digitized information. This led to the accelerated discovery of 38 previously unexplored metallic glass forming compounds.

References

  1. B. U. Kota, R. R. Nair, S. Setlur, A. Dasgupta, S. Broderick, V. Govindaraju, and K. Rajan, “Automated analysis of phase diagrams,” in Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on, vol. 2, pp. 17–18, IEEE, 2017.

  2. A. Dasgupta, S.R. Broderick, C. Mack, B.U. Kota, R. Subramanian, S. Setlur, V. Govindaraju, K. Rajan. “Probabilistic Assessment of Glass Forming Ability Rules for Metallic Glasses Aided by Automated Analysis of Phase Diagrams.” Scientific Reports, 9, 357 (2019)