upxo.viz.vizDistr module

vizDistr.py — Distribution visualisation for UPXO grain structure analyses.

Provides the DistrViz class for plotting scalar grain property distributions (area, perimeter, aspect ratio, …) and angular misorientation distributions (MDF). Designed to complement ebsdviz.plot_mdf — use DistrViz.plot_mdf when peaks are not yet computed; use ebsdviz.plot_mdf for fully annotated MDF with peak labels and KDE from the peaks dict.

Typical usage

Grain size:

dv = DistrViz(areas, label=’Grain area’, units=’µm²’) fig, ax = dv.plot_hist(bins=40, show_kde=True, step_size=rdr.step_size) plt.show() dv.print_stats()

MDF (lightweight, no peaks dict required):

dv = DistrViz.from_mdf(mdf) fig, ax = dv.plot_mdf(mdf) plt.show()

Multiple properties:

fig, axes = DistrViz.multi(: {‘Grain area’: areas, ‘Aspect ratio’: ar, ‘Perimeter’: perim}, units_dict={‘Grain area’: ‘µm²’, ‘Aspect ratio’: ‘’, ‘Perimeter’: ‘µm’}, step_size=rdr.step_size,

) plt.show()

class upxo.viz.vizDistr.DistrViz(data, label='value', units='')[source]

Bases: object

Distribution visualiser for scalar grain properties and MDF data.

Parameters:

data (array-like) – 1-D array of values. NaN/Inf are stripped automatically.
label (str) – Property name — used in axis labels and titles.
units (str) – Unit string (e.g. ‘µm²’, ‘°’). Appended to x-label when non-empty.

classmethod from_mdf(mdf)[source]: Build from an mdf dict (output of compute_mdf_from_quats).

property stats: Dict of descriptive statistics computed from self.data.

print_stats()[source]: Print a compact statistics summary to stdout.

plot(vis='hist', bins=40, show_kde=True, show_stats=True, color='steelblue', figsize=(7, 4), log_scale=False, step_size=None, bw_method='scott', fill=True, ax=None)[source]

Unified plot dispatcher — routes to plot_hist, plot_kde, or plot_hist_kde based on vis.

Parameters:

vis (str) – 'hist', 'kde', or 'hist_kde'.
bins (int) – Histogram bin count (used by 'hist' and 'hist_kde').
show_kde (bool) – KDE overlay on histogram ('hist' only).
show_stats (bool) – Annotate mean / median lines.
color (str)
figsize (tuple)
log_scale (bool) – Log x-axis ('hist' only).
step_size (float or None) – Appended to x-label when provided.
bw_method (str or float) – KDE bandwidth selector ('kde' only).
fill (bool) – Fill KDE area ('kde' only).
ax (Axes or None)

Return type:

fig, ax

plot_hist(bins=40, show_kde=True, show_stats=True, color='steelblue', figsize=(7, 4), log_scale=False, step_size=None, ax=None)[source]

Histogram with optional KDE overlay and mean/median annotations.

Parameters:

bins (int)
show_kde (bool) – KDE curve scaled to match histogram counts.
show_stats (bool) – Draw vertical mean and median lines.
color (str)
figsize (tuple)
log_scale (bool) – Log x-axis.
step_size (float or None) – EBSD step size — appended to x-label when provided.
ax (Axes or None)

Return type:

fig, ax

plot_kde(bw_method='scott', fill=True, color='steelblue', show_stats=True, figsize=(7, 4), step_size=None, ax=None)[source]

Pure KDE plot (probability density).

Parameters:

bw_method (str or float) – Bandwidth selector passed to scipy.stats.gaussian_kde.
fill (bool) – Fill area under the KDE curve.
color – Standard plot options.
figsize – Standard plot options.
step_size – Standard plot options.
ax – Standard plot options.

Return type:

fig, ax

plot_hist_kde(bins=40, color='steelblue', show_stats=True, figsize=(7, 4), step_size=None, ax=None)[source]

Density-normalised histogram with KDE overlay.

Return type:: fig, ax

plot_mdf(mdf, show_csl=True, show_stats=True, angle_max=65.0, figsize=(8, 4), ax=None)[source]

Bar-chart MDF from a pre-computed mdf dict with optional CSL markers.

Lighter alternative to ebsdviz.plot_mdf — does not require the peaks dict. Use ebsdviz.plot_mdf when peak labels and KDE are needed.

Parameters:

mdf (dict) – Output of compute_mdf_from_quats. Required keys: ‘hist_bin_centers’, ‘hist_density’, ‘hist_bin_edges’, ‘n_pairs’, ‘mean_angle’, ‘std_angle’.
show_csl (bool) – Draw dashed vertical lines at common cubic CSL angles.
show_stats (bool) – Annotate mean ± std in the legend.
angle_max (float) – X-axis upper limit (degrees).
figsize (tuple)
ax (Axes or None)

Return type:

fig, ax

classmethod multi(data_dict, units_dict=None, step_size=None, bins=40, show_kde=True, show_stats=True, ncolumns=2, figsize_per=(5, 3.5), color='steelblue', log_scale=False)[source]

Plot distributions for multiple grain properties in a subplot grid.

Parameters:

data_dict (dict) – {label: array-like} of grain properties to plot.
units_dict (dict or None) – {label: units_str}. Missing keys default to no units.
step_size (float or None) – Passed to each subplot for x-label annotation.
bins (int)
show_kde (bool)
show_stats (bool)
ncolumns (int)
figsize_per (tuple) – (width, height) per panel in inches.
color (str)
log_scale (bool)

Return type:

fig, axes (axes is a flat ndarray)

upxo.viz.vizDistr.plot_grouped_distributions(data, prop_labels=None, group_colors=None, group_labels=None, bins=40, bw_method='scott', peak_prominence=0.01, figsize_per=(5, 4), dpi=110, suptitle='Property distributions by group', ncols=None, fontsize=9.0, show_hist=True, show_peaks=True, show_legend=True, x_margin=0.03, do_tight_layout=True)[source]

Overlaid histogram + KDE + peak markers for multiple properties and groups.

Generic plotting function — no knowledge of grain structures or UPXO data formats. Data must be pre-extracted into plain arrays before calling.

Parameters:

data (dict) – {prop_name: {group_name: array-like}} — one entry per property, each containing one array per group. Arrays may be empty; empty/size-1 groups are silently skipped.
prop_labels (dict or None) – {prop_name: display_label} for axis / title text. Missing keys fall back to the prop_name itself.
group_colors (dict or None) – {group_name: colour_string}. Missing keys cycle through a default palette.
group_labels (dict or None) – {group_name: display_label} for legend entries. Missing keys fall back to the group_name itself.
bins (int) – Number of histogram bins (shared x-range across groups per property).
bw_method (str or float) – Bandwidth selector passed to scipy.stats.gaussian_kde.
peak_prominence (float) – Fraction of KDE maximum used as minimum prominence for find_peaks.
figsize_per (tuple) – (width, height) in inches per subplot panel.
dpi (int) – Figure resolution.
suptitle (str) – Figure-level title.
ncols (int or None) – Subplot grid columns. None places all panels in a single row.
fontsize (float) – Base font size; tick labels use fontsize-2, legend fontsize-2, peak annotations fontsize-3, suptitle fontsize+1.
show_hist (bool) – Draw histogram bars behind the KDE curves. Default True.
show_peaks (bool) – Draw vertical dashed lines and value annotations at KDE peaks. Default True.
show_legend (bool) – Draw a per-group legend on each subplot. Default True.
x_margin (float) – Fractional padding added to both sides of the x-axis so that tick labels are never clipped at the axis boundary. Default 0.03.
do_tight_layout (bool) – Call plt.tight_layout() before returning. Set to False when the caller needs to adjust the figure (e.g. to add a colorbar) before finalising the layout. Default True.

Returns:

fig, axes

Return type:

Figure and 2-D axes array (shape (nrows, ncols_used)).

upxo.viz.vizDistr.plot_repr_rank(repr_rank_ng: dict, figsize=None, dpi: int = 100, fontsize_annot: float = 8.0, fontsize_tick: float = 9.0, fontsize_title: float = 9.0, fontsize_suptitle: float = 11.0) → None[source]

Five vertically stacked heatmaps showing the per-property rank of every MC time slice under each representativeness metric (ratio, Wasserstein, energy distance, KS statistic, Anderson–Darling statistic).

Colour encodes rank within each column independently: green = best (rank 1), red = worst (rank N). Cell text shows the raw numeric score. Rows are ordered best-to-worst by the aggregate score (inherited from the DataFrame sort order in repr_rank_ng).

Ranking rule per column: - ratio, property columns : rank by |value − 1| ascending

(closest to 1.0 = best)

ratio, aggregate column : rank by value ascending (lowest = best)
wasserstein / energy : rank by value ascending (lowest = best)

Parameters:

repr_rank_ng (dict) – {'ratio': df, 'wasserstein': df, 'energy': df} — as stored in repgen2d.repr_rank_ng after calling find_repr_mcgs_props.
figsize (tuple or None) – Override default figure size. Default auto-computes from data shape.
dpi (int) – Figure resolution.
fontsize_annot (float) – Font size for the numeric value printed in each cell.
fontsize_tick (float) – Font size for axis tick labels (slice keys on y-axis, column names on x-axis).
fontsize_title (float) – Font size for each panel title.
fontsize_suptitle (float) – Font size for the overall figure title.

upxo.viz.vizDistr.plot_normalized_prop_distributions(ebsd_data: dict, mc_data: dict, props: list, scores: dict | None = None, prop_labels: dict | None = None, bins: int = 40, bw_method='scott', figsize_per: tuple = (5, 4), dpi: int = 100, ncols: int | None = None, fontsize: float = 9.0, show_hist: bool = True, show_peaks: bool = True, legend_loc: str = 'upper right', legend_ncol: int = 1, legend_fontsize: float | None = None) → None[source]

Overlaid normalised property distributions for EBSD (merged) and MC slices.

Each distribution is normalised by its own mean before plotting, matching the normalisation used in find_repr_mcgs_props. All curves are therefore centred near 1.0 on the x-axis and are directly shape-comparable.

Wasserstein and energy distances are annotated in each subplot legend when scores is provided.

Parameters:

ebsd_data (dict) – {prop: array} of EBSD-merged property values, each already divided by its own mean.
mc_data (dict) – {slice_key: {prop: array}} of MC property values, each already divided by its own mean.
props (list of str) – Ordered list of property names to plot.
scores (dict or None) – {slice_key: {prop: {'wasserstein': v, 'energy': v}}} extracted from repr_rank_ng. When supplied, each MC curve’s legend entry is annotated with W=... E=... for the per-property distance.
prop_labels (dict or None) – {prop: display_label}. Defaults to f'{prop} (mean normalized)'.
bins – Forwarded to plot_grouped_distributions().
bw_method – Forwarded to plot_grouped_distributions().
figsize_per – Forwarded to plot_grouped_distributions().
dpi – Forwarded to plot_grouped_distributions().
ncols – Forwarded to plot_grouped_distributions().
fontsize – Forwarded to plot_grouped_distributions().
show_hist – Forwarded to plot_grouped_distributions().
show_peaks – Forwarded to plot_grouped_distributions().
legend_loc (str) – Legend location string passed to ax.legend(loc=...). Examples: 'upper right', 'upper left', 'lower right', 'center left', 'best'. Default 'upper right'.
legend_ncol (int) – Number of columns in the legend. Values > 1 split entries side-by-side, reducing legend height and — when entries are uniform in width — the overall legend footprint. Default 1 (single column).
legend_fontsize (float or None) – Font size for legend text. Reducing this is the most direct way to shrink the legend box since box width is driven by label text length. Defaults to fontsize - 2 when None.

upxo.viz.vizDistr.plot_qq_comparison(ebsd_data: dict, mc_data: dict, props: list, prop_labels: dict | None = None, figsize_per: tuple = (4, 4), dpi: int = 100, ncols: int | None = None, fontsize: float = 9.0) → None[source]

Quantile–Quantile (Q-Q) comparison of EBSD vs MC grain property distributions.

A Q-Q plot maps the quantiles of one distribution against the quantiles of another at the same probability levels (0 % to 100 %). Both distributions are normalised by their own mean before comparison, so the x- and y-axes share the same dimensionless scale centred near 1.0.

Interpretation

Points on the diagonal (y = x) — the two distributions have identical shape at that quantile. Perfect agreement.
Points above the diagonal — the MC distribution has larger values than EBSD at that quantile (heavier upper tail or higher spread in MC).
Points below the diagonal — the MC distribution has smaller values than EBSD at that quantile.
Deviations concentrated in the lower-left — fine/small grains differ.
Deviations concentrated in the upper-right — large/coarse grains differ.

One subplot is drawn per property; each MC slice is a separate line. The dashed black diagonal marks perfect distributional agreement.

param ebsd_data:: {prop: array} of EBSD-merged values, each normalised by own mean.
type ebsd_data:: dict
param mc_data:: {slice_key: {prop: array}} of MC values, each normalised by own mean.
type mc_data:: dict
param props:: Properties to plot.
type props:: list of str
param prop_labels:: {prop: display_label}. Defaults to f'{prop} (mean normalized)'.
type prop_labels:: dict or None
param figsize_per:: (width, height) per subplot in inches.
type figsize_per:: tuple
param dpi:
type dpi:: int
param ncols:: Subplot grid columns. None places all panels in a single row.
type ncols:: int or None
param fontsize:
type fontsize:: float

upxo.viz.vizDistr.plot_ebsd_tvf(tvf_result: dict, figsize: tuple = (7, 4), dpi: int = 100, fontsize: float = 9.0, title: str = 'EBSD grain-role area fractions') → None[source]

Horizontal bar chart of EBSD twin area fraction broken down by grain role.

Bars are drawn for each of the four grain-role categories:

Pure parents — matrix grains; never a twin of any grain.
Primary twins — first-generation twins whose parent is a pure parent.
Secondary twins — twins whose parent is itself an intermediate (twin-of-a-twin, 2nd generation).
Intermediate twins — grains that are simultaneously a twin of one grain and a parent of another (twin chains).

The overall twin area fraction (primary + secondary + intermediate) is annotated on the figure.

Parameters:

tvf_result (dict) – Output of repgen2d.compute_ebsd_tvf. Must contain keys 'pure_parent_frac', 'primary_twin_frac', 'secondary_twin_frac', 'intermediate_frac', 'overall_twin_frac'.
figsize (tuple) – Figure size (width, height) in inches.
dpi (int) – Figure resolution.
fontsize (float) – Base font size for labels and tick marks.
title (str) – Figure title.