Informatik (Subject) / Data Visualization (Lesson)

There are 38 cards in this lesson

Basics of Data Visualization

This lesson was created by freddy_teddy.

Learn lesson

  • statistical data types nominal: discrete, no ordering, only =, != ordinal: discrete, ordering but no distance, =, !=, <, > quantitative: discrete or continuous, ordering, distance, =, !=, <, >, +, - also: sequential, diverging, tabular, relational, spatial, temporal
  • line charts best for quant -> quant hard for humans to read vertical distance further information: glyphs, color
  • horizon graphs first normal line chart horizontal split (equal dist) of the area below the line every band gets different color, overlay bands there can be no intersection of the bands
  • pie charts for univariate data values should add up to 100%
  • bubble charts up to 4 scalar variables: x and y coordinate, size of disc, color of disc hard to estimate disc sizes
  • marks and channels marks: shape/geometry of encoding (eg points, line, area, volume) channels: type of the mark (eg pos, size, shape, orientation, color) pay attention to perceived intensity change of different marks and channels (hard to perceive are eg brightness, saturation, area, depth)
  • Visualization Process 1. data IO -> raw data 2. data analysis -> prepared data 3. filtering -> focused data 4. mapping -> geometry 5. rendering
  • exploratory data analysis (EDA) uncover underlying structure extract important factors detect anomalies test assumptions develop parsimonious models
  • dataset statistics for high-d data (different plot types) histograms     number of bins: too few -> little info, too many -> bad overview, maybe many empty bins     desciption: symmetric, skewed left/right, uni-/bi-/multimodal boxplots scatterplots
  • boxplots depiction of distribution structure:     Interquartile range: Median, upper and lower Quartile, middle 50% of the data         measurement for data variation     Whiskers (diff defs: eg maximal 1.5 times the length of the interquartile range, ending at a data point)     Minimum, Maximum of Whiskers     Outlier
  • data transformation, visual mapping, view transformation data trans:     methods for analysis: dim reduction, regression, subspace clustering, feature extraction, topological analysis, data sampling, abstraction visual mapping:     organizing information from data transformation stage     visual encodings on axes (eg paralles coordinate plots), glyphs, hierarchical representation, animation view transformation (focuses on screen space and rendering, not in detail in this lecture)
  • scatterplot matrix (SPLOM) axis based technique, visual mapping stage scatterplots for all pair-wise combinations use colors to encode classes works only for limited number of variables and more or less continuous data problem 1: many datapoints -> overplotting, occlusion problem 2: too many glyph based datapoints -> heavy load on graphics card     textures; kernel density estimates (KDE) problem 3: no inherent scaling/aspect ratio     minimize total edge length of Delauney graph
  • parallel coordinates quantitative data tuples each vertex is a polyline connecting values of two axes information:     distribution of data per variable     frequent/rare values     relationship between neighboring variables occlusion: transparency or color encoding only up to 12 variables next to each other sorting of axes needs to be chosen -> bias
  • parallel coordinate matrix (PCM) construction is a graph-theoretic problem: find a Hamiltonian decomposition of the complete graph Lukas-Walecki H decomposition Hurley and Oldford algorithm
  • radial layouts for charts star coordinate plot RadViz
  • glyph-based techniques for multivariate data visual mappings that encode all variables of one data record in a single visual item Chernoff faces radar chart (can be both glyph- and axis-based)
  • small multiples apply the same type of vis to multiple (sub)sets of the data, display them next to each other guidelines: all relevant subimages in field of view order is systematic and the rule known to user visual design of subplots is consistent (eg same color map)
  • pixel oriented approaches eg heatmaps use colorcoding in a matrix to encode tabular data
  • PCA data transformation, method for analysis statistical procedure, identifies new set of orthogonal axes
  • variance, covariance Var(X) = E[(X - E[X])2] (covariance of a variable with itself) Cov(X, Y) = E[X - E[X])(Y - E[Y])]     measure of the joint variability of two variables, shows tendency in linear relationship of the variables
  • TODO PCA: Eigendecomposition, Eigenvector, Eigenvalue perform an Eigendecomposition of the covariance matrix to get Q (matrix with Eigenvectors as columns) and ∧ (diagonal matrix with Eigenvalues)
  • explore a novel dataset vs. finding specific information what to show explore: overview -> zoom and filter -> details on demand find: search -> show context -> expand on demand
  • What is stated in Hick's law and in Fitt's law? H: time it takes for a person to make a decision based in number of options F: time it need to click on sth based on distance and width of target
  • Ways to visualize a graph containment or connection node-link diagram (for networks or trees) adjacency matrix (for networks or trees) enclosure (for trees, not for networks) (layered node-link, icicle, radial, concentric circles, nested circles (balloon), treemap, indented outline)
  • algorithms for layered node-link diagram, digraphs and undirected graphs Reingold-Tilford Sugiyama framework (dot-algorithm for heuristics) Fruchterman-Reingold
  • radial layout for trees more space efficient problems: levels more difficult to read, no clear sorting of children, placement of text algorithm: radial variant of Reingold-Tilford
  • def: level graph, hierarchy level graph: G(V,E,λ), directed, acyclic graph with a mapping that partitions V in sets of layers hierarchy: G(V,E,λ), level graph, for every node in one layer there is a node in a previous layer it is connected to
  • Sugiyama framework step 1: get rid of cycles step 2: assign levels to nodes, dummy nodes for long edges step 3: sort nodes per level to minimize edge crossings step 4: modify x coordinates, to straighten edges
  • dot algorithm Sugiyama with heuristics to solve NP-complete problems step 1: DFS from source (or node with few ingoing edges) -> spanning tree -> revert back edges step 2: integer linear program step 3: initial layout improved iteratively on slice-per-slice basis (in each level nodes get weight) step 4: linear program
  • optional user-defined restrictitons for force directed layout of undirected graphs fixed node positions for some nodes fixed subgraphs variable edge length depending on parameter/function
  • problems with force directed layout for undirected graphs high cost (O(n2), optimized O(n log n)) easy to get trapped in minima not suited for dense graphs (works for <1000 nodes, few edges per node)
  • def Scalarfield and use cases 2D and 3D (D, C, f)  D is domain of definition C is scalar range f: D -> C 2D: color map, height field, isolines 3D: cutting planes, volume rendering, isosurfaces
  • general dataset structure geometry: finite set of points topology: neighborhood relationship between points attributes: one or more functions mapping points in D into other spaces
  • types of grids uniform, rectilinear, structured, unstructured, scattered
  • isolines: how many cases for triangles and quads? triangles: 4 relevant (in total 8) quads: 6 relevant (can be cut to 4 if + and - are regarded as symmetric) (in total 16), one ambiguous
  • types of color spaces RGB/CMYK: aim: specify color for technical devices (screen, printer) HSV/HSL: aim: easy color specification for humans CIE Lab: aim: distances match perceived distances 
  • direct and indirect volume rendering indirect: detour via geometry, user controls geometry via parameters (isovalue, cutting plane)     problem: strict criteria, shows only small part of data direct: similar to ray tracing, data considered to represent a semi transparent light-emitting medium     emission, absorption     transfer function: scalar -> visual properties     
  • Levoys volume rendering pipeline step 1: sampling step 2: classification and illumination step 3: compositing