by chance - enhancing interaction with large data sets through statistical sampling

Alan Dix
Lancaster University, UK.
email: alan@hcibook.com

Geoffrey Ellis
Huddersfield University, UK.
email: g.p.ellis@hud.ac.uk

Paper accepted for AVI'2002, Advanced Visual Interfaces, May 22-24, 2002, Trento, ITALY .

Download draft paper as PDF (288Kb)

Full reference:

A. Dix and G. Ellis (2002). By chance - enhancing interaction with large data sets through statistical sampling. Proceedings of Advanced Visual Interfaces AVI2002, Trento, Italy, ACM Press. pp.167-176.
http://www.hcibook.com/alan/papers/avi2002/

See also:: more on randomness at: http://www.hcibook.com/alan/topics/vis/; related work on visualisation at: http://www.hcibook.com/alan/topics/random/

Abstract

The use of random algorithms in many areas of computer science has enabled the solution of otherwise intractable problems. In this paper we propose that random sampling can make the visualisation of large datasets both more computationally efficient and more perceptually effective. We review the explicit uses of randomness and the related deterministic techniques in the visualisation literature. We then discuss how sampling can augment existing systems. Furthermore, we demonstrate a novel 2D zooming interface -the Astral Telescope Visualiser, a visualisation suggested and enabled by sampling. We conclude by considering some general usability and technical issues raised by sampling-based visualisation.

keywords: random sampling, visualisation, very large data sets, Astral Telescope Visualiser, sampling from databases

1 Introduction and background: In which we consider some of the problems of visualising large data sets and also some of the uses of randomness in other areas of computing.

2 Existing randomness and alternatives: In which we review existing visualisation techniques that use random effects and also techniques that achieve similar aims.

3 Using randomness: In which we suggest ways of using randomness to enhance or enable different forms of visualisation and interaction.

4 Randomness and interaction: In which we discus some of the issues sampling raises for interaction and how to choose correct sampling levels.

5 Sampling databases: In which we examine ways of extracting random samples from existing databases, look at some research literature on sampling from large data sets and see how this may be used to help design bespoke data storage.

6 Conclusions: In which we sum up that randomness is a jolly good thing and the next AVI should be held in Monte Carlo :-)

References

[Ahlberg 1994] C. Ahlberg, B. Shneiderman. Visual Information Seeking: Tight Coupling of Dynamic Query Filters with Starfield Displays, Proc. ACM Conference on Human Factors in Software, CHI '94, Boston, April 1994. ACM Press. pp.313-317
[Benford 1994] S. Benford and J. Mariani. Virtual environments for data sharing and visualisation populated information terrains. Proc. IDS'94: The 2nd International Workshop on User Interfaces to Databases. Lancaster, UK, April 1994. Springer Verlag: Workshops in Computer Science. pp.168182
Benyon 1992] D. Benyon. Task analysis and system design: the discipline of data. Interacting with Computers. 4(1):246249, 1992
[Brodbeck 1997] D. Brodbeck, M. Chalmers, A. Lunzer, P. Cotture. Domesticating Bead: Adapting an Information Visualization System to a Financial Institution, Proc. IEEE Information Visualization 97, Phoenix, October 1997. pp.73-80
[Bruijn 2000] O. de Bruijn and R. Spence. Rapid serial presentation: a spacetime trade-off in information presentation. Proc. Advanced Visual Interfaces - AVI2000, ACM Press, 2000, pp.51-60
[CDMA 2000] CDMA Develpment Group. What is CDMA (Code Division Multiple Access)? (accessed 17th Nov 2001, dated © 2000) http://www.cdg.org/tech/about_cdma.asp
[Chalmers 1999] M. Chalmers. Informatics, Architecture and Language. Social Navigation in Information Space, A. Munro, K. Hook & D. Benyon (eds.), Springer, 1999. http://www.dcs.gla.ac.uk/~matthew/papers/socnav.pdf
[Chaudhuri 1998] S. Chaudhuri, R. Motwani, V. Narasayya. Random Sampling for Histogram Construction: How much is enough? Proc. ACM SIGMOD98, Seattle, 1998
[Dix 1992] A. Dix. Human issues in the use of pattern recognition techniques. Neural Networks and Pattern Recognition in Human Computer Interaction, 1992. Eds. R. Beale and J. Finlay. Ellis Horwood. pp.429-451 http://www.hcibook.com/alan/papers/neuro92/
[Dix 1994] A. Dix and A. Patrick. Query By Browsing. Proc. IDS'94: The 2nd International Workshop on User Interfaces to Databases, Lancaster, UK, April 1994. Springer Verlag: Workshops in Computer Science. pp.236-248 http://www.hcibook.com/alan/papers/QbB-IDS94/
[Dix 1996] A. Dix. Time, space and interaction. Proc. FADIVA 3, Ed. I. Catarci. Gubbio, University of Rome, Italy, 1996. pp.99-103. http://www.hcibook.com/alan/papers/FADIVA/
[Dugelay 2001] Jean-Luc Dugelay. Digital watermarking (tutorial). SAICSIT 2001, South African Institute of Computer Scientists and Information Technologists Annual Conference, University of South Africa, Pretoria, September 2001. pp.25-28
[Ellis 1994] G.P. Ellis, J.E. Finlay, A.S. Pollitt. HIBROWSE for Hotels: bridging the gap between user and system views of a database Proc. IDS'94 2nd International Workshop on User Interfaces to Databases, Lancaster, UK, April 1994. Springer Verlag: Workshops in Computer Science, pp.45-58
[Furnas 1986] G. W. Furnas. Generalized Fisheye Views, Proc. ACM CHI '86, Boston, April 1986. ACM Press. pp.16-23
[Gheisra 1998] A. Dix. Statistics tutorial: Gheisra a story. meandeviation.com, 1998. http://www.meandeviation.com/tutorials/stats/Gheisra/
[Guthrie 1989] D. Guthrie. Statistical models and analysis on auditing, panel on nonstandard mixture of distributions, Statistical Science 4, pp.2-33
[Hendley 1995] R.J. Hendley, N.S. Drew, A.M. Wood and R. Beale. Narcissus: visualizing information. Proc. IEEE Information Vizualization'95. IEEE 1995. pp.90-96,146
[Keim 1994] D. Keim and H-P. Kreigal. VisDB: database exploration using multidimensional visualization. IEEE Computer Graphics and Applications, September 1994. pp.4049
[Keim 1999] D. A. Keim and A. Herrmann. The Gridfit Algorithm: An Efficient and Effective Approach to Visualizing Large Amounts of Spatial Data, Proc. Visualization '98, Research Triangle Park, NC, 1998, pp.181-188, 531
[Kohonen 1990] T. Kohonen. The self-organizing map. Proceedings of the IEEE, 78(9):14641480, 1990
[Kreuseler 1999] M. Kreuseler, H. Schumann. Information visualization using a new Focus+Context Technique in combination with dynamic clustering of information space. Proc. NPIV'99 (New Paradigms in Information Visualization and Manipulation), Missouri, Nov. 1999, pp.1-5
[Lamping 1996] J. Lamping and R. Rao. Visualizing Large Trees Using the Hyperbolic Browser, Proc. ACM CHI '96, Vancouver, April 1996. ACM Press. pp.388-389
[Lin 1992] X. Lin. Visualization for the document space. Proc. IEEE Visualisation'92. IEEE, 1992. pp.274281
[Lin 1997] X. Lin. Map displays for information retrieval. Journal of the American Society for Information Science, 48(1):4054, 1997
[Manku 1999] G. S. Manku, S. Rajagopalan. and B. G. Lindsay Random sampling techniques for space efficient online computation of order statistics of large datasets. Proc. SIGMOD’99 Int'l Conf. on Management of Data, Philadephia, May 1999, ACM Press, pp.251-262
[Netmap 2001] NetMap Link Analysis: making the invisible, visible. 2001
http://www.netmap.com/ > Presentations > Link Analysis
[Olken 1986] F. Olken, D. Rotem. Random Sampling from Relational Databases. Proc. VLDB'86 Twelfth International Conference on Very Large Data Bases, August 1986, Kyoto, Japan, Morgan Kaufmann, pp.160--169
[Olken 1993] F. Olken. Random Sampling from Databases. Ph.D. dissertation, UC Berkeley, April 1993, LBL Technical Report 32883
[Piatetsky-Shapiro 1984] G. Piatetsky-Shapiro, C. Connell. Accurate estimation of the number of tuples satisfying a condition. Proc. SIGMOD’84, Boston, June 1984. ACM Press. pp.256276
[Pirolli 1996] P. Pirolli, P. Schank, M. Hearst, C. Diehl. Scatter/ Gather browsing communicates the topic structure of a very large text collection, Proc. CHI'96, Vancouver, May 1996, ACM Press, pp.213220
[Pirolli 1997] P. Pirolli. Computational Models of Information Scent-Following in a very Large Browsable Text Collection. Proc. Conference on Human Factors in Computing Systems, CHI'97, Atlanta, March 1997, ACM Press, pp.3-10
[QbB 2001] A. Dix. Query-by-Browsing on the Web. meandeviation.com, 2001. http://www.meandeviation.com/qbb/qbb.php
[Raman 1998] R. Raman. Random Sampling Techniques in Parallel Computation. IPPS/SPDP Workshops 1998, pp.351360
[Rao and Card 1994] R. Rao, S. Card. The Table Lens: Merging graphical and symbolic representations in an interactive focus + context visualization for tabular information, Proc. CHI'94, Boston, ACM Press, 1994, pp.111117
[Salton 1994] G. Salton, J. Allan, C. Buckley and A. Singhal. Automatic analysis, theme generation and summarization of machine-readable texts. Science, 264:1411426, 1994
[Schneier 1996] B. Schneier. Applied Cryptography second edition. Wiley, 1996
[Shneiderman 1998] B. Shneiderman. Designing the User Interface, Third Edition. Addison-Wesley, 1998
[Skalak 1994] D. B. Skalak. Prototype and feature selection by sampling and random mutation hill climbing algorithms. In Proceedings of the Eleventh International Machine Learning Conference. New Brunswick, NJ: Morgan Kaufmann, 1994, pp. 293--301
[Spence 2001] R. Spence. Information Visualisation. Addison-Wesley, 2001
[Tweedie 1994] L. Tweedie, R. Spence, D. Williams, R. Bhogal. The Attribute Explorer Video proceedings CHI'94. ACM Press, 1994
[Tweedie 1995] L. Tweedie, R. Spence, H. Dawkes and H. Su. The Influence Explorer. Companion Proceedings CHI'95. ACM Press, 1995, pp.129-130
[Tweedie 1996] L. Tweedie, R. Spence, H. Dawkes and H. Su. Externalizing abstract mathematical models. Proc. CHI'96. ACM Press, 1996, pp.406412
[Tweedie 1997] L. Tweedie. Characterizing interactive externalizations. Proc. CHI'97. ACM Press, 1997, pp.375382
[Williamson] C. Williamson, B. Shneiderman. The Dynamic HomeFinder: Evaluating dynamic queries in a real-estate information exploration system, Proc. SIGIR’92, ACM Press, pp.339-346
[Woodruff 1998a] A. Woodruff, J. Landay and M. Stonebraker. Constant Information Density in Zoomable Interfaces. Advanced Visual Interfaces '98, L'Aquila, Italy, pp.57-65
[Woodruff 1998b] A. Woodruff, J. Landay and M. Stonebraker. Constant Density Visualizations of Non-Uniform Distributions of Data. Proc. UIST'98, San Francisco, 1998, pp.19-28

http://www.hcibook.com/alan/papers/avi2002/

Alan Dix 31/1/2002

by chance - enhancing interaction with large data sets through statistical sampling

Abstract

Contents

References