Informatik (Fach) / Data Visualization (Lektion)

In dieser Lektion befinden sich 38 Karteikarten

Basics of Data Visualization

Diese Lektion wurde von freddy_teddy erstellt.

Lektion lernen

  • statistical data types nominal: discrete, no ordering, only =, != ordinal: discrete, ordering but no distance, =, !=, <, > quantitative: discrete or continuous, ordering, distance, =, !=, <, >, +, - also: sequential, diverging, ...
  • line charts best for quant -> quant hard for humans to read vertical distance further information: glyphs, color
  • horizon graphs first normal line chart horizontal split (equal dist) of the area below the line every band gets different color, overlay bands there can be no intersection of the bands
  • pie charts for univariate data values should add up to 100%
  • bubble charts up to 4 scalar variables: x and y coordinate, size of disc, color of disc hard to estimate disc sizes
  • marks and channels marks: shape/geometry of encoding (eg points, line, area, volume) channels: type of the mark (eg pos, size, shape, orientation, color) pay attention to perceived intensity change of different marks and ...
  • Visualization Process 1. data IO -> raw data 2. data analysis -> prepared data 3. filtering -> focused data 4. mapping -> geometry 5. rendering
  • exploratory data analysis (EDA) uncover underlying structure extract important factors detect anomalies test assumptions develop parsimonious models
  • dataset statistics for high-d data (different plot ... histograms     number of bins: too few -> little info, too many -> bad overview, maybe many empty bins     desciption: symmetric, skewed left/right, uni-/bi-/multimodal boxplots scatterplots
  • boxplots depiction of distribution structure:     Interquartile range: Median, upper and lower Quartile, middle 50% of the data         measurement for data variation     Whiskers (diff defs: eg maximal ...
  • data transformation, visual mapping, view transformation ... data trans:     methods for analysis: dim reduction, regression, subspace clustering, feature extraction, topological analysis, data sampling, abstraction visual mapping:     organizing information ...
  • scatterplot matrix (SPLOM) axis based technique, visual mapping stage scatterplots for all pair-wise combinations use colors to encode classes works only for limited number of variables and more or less continuous data problem ...
  • parallel coordinates quantitative data tuples each vertex is a polyline connecting values of two axes information:     distribution of data per variable     frequent/rare values     relationship between neighboring ...
  • parallel coordinate matrix (PCM) construction is a graph-theoretic problem: find a Hamiltonian decomposition of the complete graph Lukas-Walecki H decomposition Hurley and Oldford algorithm
  • radial layouts for charts star coordinate plot RadViz
  • glyph-based techniques for multivariate data visual mappings that encode all variables of one data record in a single visual item Chernoff faces radar chart (can be both glyph- and axis-based)
  • small multiples apply the same type of vis to multiple (sub)sets of the data, display them next to each other guidelines: all relevant subimages in field of view order is systematic and the rule known to user visual ...
  • pixel oriented approaches eg heatmaps use colorcoding in a matrix to encode tabular data
  • PCA data transformation, method for analysis statistical procedure, identifies new set of orthogonal axes
  • variance, covariance Var(X) = E[(X - E[X])2] (covariance of a variable with itself) Cov(X, Y) = E[X - E[X])(Y - E[Y])]     measure of the joint variability of two variables, shows tendency in linear relationship of the ...
  • TODO PCA: Eigendecomposition, Eigenvector, Eigenvalue ... perform an Eigendecomposition of the covariance matrix to get Q (matrix with Eigenvectors as columns) and ∧ (diagonal matrix with Eigenvalues)
  • explore a novel dataset vs. finding specific information ... explore: overview -> zoom and filter -> details on demand find: search -> show context -> expand on demand
  • What is stated in Hick's law and in Fitt's law? H: time it takes for a person to make a decision based in number of options F: time it need to click on sth based on distance and width of target
  • Ways to visualize a graph containment or connection node-link diagram (for networks or trees) adjacency matrix (for networks or trees) enclosure (for trees, not for networks) (layered node-link, icicle, radial, concentric circles, ...
  • algorithms for layered node-link diagram, digraphs ... Reingold-Tilford Sugiyama framework (dot-algorithm for heuristics) Fruchterman-Reingold
  • radial layout for trees more space efficient problems: levels more difficult to read, no clear sorting of children, placement of text algorithm: radial variant of Reingold-Tilford
  • def: level graph, hierarchy level graph: G(V,E,λ), directed, acyclic graph with a mapping that partitions V in sets of layers hierarchy: G(V,E,λ), level graph, for every node in one layer there is a node in a previous layer it ...
  • Sugiyama framework step 1: get rid of cycles step 2: assign levels to nodes, dummy nodes for long edges step 3: sort nodes per level to minimize edge crossings step 4: modify x coordinates, to straighten edges
  • dot algorithm Sugiyama with heuristics to solve NP-complete problems step 1: DFS from source (or node with few ingoing edges) -> spanning tree -> revert back edges step 2: integer linear program step 3: initial layout ...
  • optional user-defined restrictitons for force directed ... fixed node positions for some nodes fixed subgraphs variable edge length depending on parameter/function
  • problems with force directed layout for undirected ... high cost (O(n2), optimized O(n log n)) easy to get trapped in minima not suited for dense graphs (works for <1000 nodes, few edges per node)
  • def Scalarfield and use cases 2D and 3D (D, C, f)  D is domain of definition C is scalar range f: D -> C 2D: color map, height field, isolines 3D: cutting planes, volume rendering, isosurfaces
  • general dataset structure geometry: finite set of points topology: neighborhood relationship between points attributes: one or more functions mapping points in D into other spaces
  • types of grids uniform, rectilinear, structured, unstructured, scattered
  • isolines: how many cases for triangles and quads? triangles: 4 relevant (in total 8) quads: 6 relevant (can be cut to 4 if + and - are regarded as symmetric) (in total 16), one ambiguous
  • types of color spaces RGB/CMYK: aim: specify color for technical devices (screen, printer) HSV/HSL: aim: easy color specification for humans CIE Lab: aim: distances match perceived distances 
  • direct and indirect volume rendering indirect: detour via geometry, user controls geometry via parameters (isovalue, cutting plane)     problem: strict criteria, shows only small part of data direct: similar to ray tracing, data considered ...
  • Levoys volume rendering pipeline step 1: sampling step 2: classification and illumination step 3: compositing