Data Visualization - Informatik learn online

Informatik (Subject) / Data Visualization (Lesson)

There are 38 cards in this lesson

Basics of Data Visualization

This lesson was created by freddy_teddy.

Learn lesson

statistical data types nominal: discrete, no ordering, only =, != ordinal: discrete, ordering but no distance, =, !=, <, > quantitative: discrete or continuous, ordering, distance, =, !=, <, >, +, - also: sequential, diverging, tabular, relational, spatial, temporal
line charts best for quant -> quant hard for humans to read vertical distance further information: glyphs, color
horizon graphs first normal line chart horizontal split (equal dist) of the area below the line every band gets different color, overlay bands there can be no intersection of the bands
pie charts for univariate data values should add up to 100%
bubble charts up to 4 scalar variables: x and y coordinate, size of disc, color of disc hard to estimate disc sizes
marks and channels marks: shape/geometry of encoding (eg points, line, area, volume) channels: type of the mark (eg pos, size, shape, orientation, color) pay attention to perceived intensity change of different marks and channels (hard to perceive are eg brightness, saturation, area, depth)
Visualization Process 1. data IO -> raw data 2. data analysis -> prepared data 3. filtering -> focused data 4. mapping -> geometry 5. rendering
exploratory data analysis (EDA) uncover underlying structure extract important factors detect anomalies test assumptions develop parsimonious models
dataset statistics for high-d data (different plot types) histograms number of bins: too few -> little info, too many -> bad overview, maybe many empty bins desciption: symmetric, skewed left/right, uni-/bi-/multimodal boxplots scatterplots
boxplots depiction of distribution structure: Interquartile range: Median, upper and lower Quartile, middle 50% of the data measurement for data variation Whiskers (diff defs: eg maximal 1.5 times the length of the interquartile range, ending at a data point) Minimum, Maximum of Whiskers Outlier
data transformation, visual mapping, view transformation data trans: methods for analysis: dim reduction, regression, subspace clustering, feature extraction, topological analysis, data sampling, abstraction visual mapping: organizing information from data transformation stage visual encodings on axes (eg paralles coordinate plots), glyphs, hierarchical representation, animation view transformation (focuses on screen space and rendering, not in detail in this lecture)
scatterplot matrix (SPLOM) axis based technique, visual mapping stage scatterplots for all pair-wise combinations use colors to encode classes works only for limited number of variables and more or less continuous data problem 1: many datapoints -> overplotting, occlusion problem 2: too many glyph based datapoints -> heavy load on graphics card textures; kernel density estimates (KDE) problem 3: no inherent scaling/aspect ratio minimize total edge length of Delauney graph
parallel coordinates quantitative data tuples each vertex is a polyline connecting values of two axes information: distribution of data per variable frequent/rare values relationship between neighboring variables occlusion: transparency or color encoding only up to 12 variables next to each other sorting of axes needs to be chosen -> bias
parallel coordinate matrix (PCM) construction is a graph-theoretic problem: find a Hamiltonian decomposition of the complete graph Lukas-Walecki H decomposition Hurley and Oldford algorithm
radial layouts for charts star coordinate plot RadViz
glyph-based techniques for multivariate data visual mappings that encode all variables of one data record in a single visual item Chernoff faces radar chart (can be both glyph- and axis-based)
small multiples apply the same type of vis to multiple (sub)sets of the data, display them next to each other guidelines: all relevant subimages in field of view order is systematic and the rule known to user visual design of subplots is consistent (eg same color map)
pixel oriented approaches eg heatmaps use colorcoding in a matrix to encode tabular data
PCA data transformation, method for analysis statistical procedure, identifies new set of orthogonal axes
variance, covariance Var(X) = E[(X - E[X])2] (covariance of a variable with itself) Cov(X, Y) = E[X - E[X])(Y - E[Y])] measure of the joint variability of two variables, shows tendency in linear relationship of the variables
TODO PCA: Eigendecomposition, Eigenvector, Eigenvalue perform an Eigendecomposition of the covariance matrix to get Q (matrix with Eigenvectors as columns) and ∧ (diagonal matrix with Eigenvalues)
explore a novel dataset vs. finding specific information what to show explore: overview -> zoom and filter -> details on demand find: search -> show context -> expand on demand
What is stated in Hick's law and in Fitt's law? H: time it takes for a person to make a decision based in number of options F: time it need to click on sth based on distance and width of target
Ways to visualize a graph containment or connection node-link diagram (for networks or trees) adjacency matrix (for networks or trees) enclosure (for trees, not for networks) (layered node-link, icicle, radial, concentric circles, nested circles (balloon), treemap, indented outline)
algorithms for layered node-link diagram, digraphs and undirected graphs Reingold-Tilford Sugiyama framework (dot-algorithm for heuristics) Fruchterman-Reingold
radial layout for trees more space efficient problems: levels more difficult to read, no clear sorting of children, placement of text algorithm: radial variant of Reingold-Tilford
def: level graph, hierarchy level graph: G(V,E,λ), directed, acyclic graph with a mapping that partitions V in sets of layers hierarchy: G(V,E,λ), level graph, for every node in one layer there is a node in a previous layer it is connected to
Sugiyama framework step 1: get rid of cycles step 2: assign levels to nodes, dummy nodes for long edges step 3: sort nodes per level to minimize edge crossings step 4: modify x coordinates, to straighten edges
dot algorithm Sugiyama with heuristics to solve NP-complete problems step 1: DFS from source (or node with few ingoing edges) -> spanning tree -> revert back edges step 2: integer linear program step 3: initial layout improved iteratively on slice-per-slice basis (in each level nodes get weight) step 4: linear program
optional user-defined restrictitons for force directed layout of undirected graphs fixed node positions for some nodes fixed subgraphs variable edge length depending on parameter/function
problems with force directed layout for undirected graphs high cost (O(n2), optimized O(n log n)) easy to get trapped in minima not suited for dense graphs (works for <1000 nodes, few edges per node)
def Scalarfield and use cases 2D and 3D (D, C, f) D is domain of definition C is scalar range f: D -> C 2D: color map, height field, isolines 3D: cutting planes, volume rendering, isosurfaces
general dataset structure geometry: finite set of points topology: neighborhood relationship between points attributes: one or more functions mapping points in D into other spaces
types of grids uniform, rectilinear, structured, unstructured, scattered
isolines: how many cases for triangles and quads? triangles: 4 relevant (in total 8) quads: 6 relevant (can be cut to 4 if + and - are regarded as symmetric) (in total 16), one ambiguous
types of color spaces RGB/CMYK: aim: specify color for technical devices (screen, printer) HSV/HSL: aim: easy color specification for humans CIE Lab: aim: distances match perceived distances
direct and indirect volume rendering indirect: detour via geometry, user controls geometry via parameters (isovalue, cutting plane) problem: strict criteria, shows only small part of data direct: similar to ray tracing, data considered to represent a semi transparent light-emitting medium emission, absorption transfer function: scalar -> visual properties
Levoys volume rendering pipeline step 1: sampling step 2: classification and illumination step 3: compositing

Learn online - when and where you want!

Informatik (Subject) / Data Visualization (Lesson)