by chance - enhancing interaction with large data sets through statistical
sampling
Paper accepted for AVI'2002,
Advanced Visual Interfaces, May 22-24, 2002, Trento, ITALY .
Download draft paper as PDF (288Kb)
Full reference:
A. Dix and G. Ellis (2002). By chance - enhancing interaction with large
data sets through statistical sampling. Proceedings of Advanced Visual
Interfaces AVI2002, Trento, Italy, ACM Press. pp.167-176.
http://www.hcibook.com/alan/papers/avi2002/
- See also:
- more on randomness at: http://www.hcibook.com/alan/topics/vis/
- related work on visualisation at: http://www.hcibook.com/alan/topics/random/
Abstract
The use of random algorithms in many areas of computer science has enabled
the solution of otherwise intractable problems. In this paper we propose that
random sampling can make the visualisation of large datasets both more computationally
efficient and more perceptually effective. We review the explicit uses of randomness
and the related deterministic techniques in the visualisation literature. We
then discuss how sampling can augment existing systems. Furthermore, we demonstrate
a novel 2D zooming interface -the Astral Telescope Visualiser, a visualisation
suggested and enabled by sampling. We conclude by considering some general usability
and technical issues raised by sampling-based visualisation.
keywords: random sampling, visualisation, very large data sets, Astral
Telescope Visualiser, sampling from databases
Contents
-
1 Introduction and background
-
In which we consider some of the problems of visualising large data
sets and also some of the uses of randomness in other areas of computing.
- 2 Existing randomness and alternatives
-
In which we review existing visualisation techniques that use random
effects and also techniques that achieve similar aims.
- 3 Using randomness
-
In which we suggest ways of using randomness to enhance or enable different
forms of visualisation and interaction.
- 4 Randomness and interaction
-
In which we discus some of the issues sampling raises for interaction
and how to choose correct sampling levels.
- 5 Sampling databases
-
In which we examine ways of extracting random samples from existing
databases, look at some research literature on sampling from large data
sets and see how this may be used to help design bespoke data storage.
-
6 Conclusions
-
In which we sum up that randomness is a jolly good thing and the next
AVI should be held in Monte Carlo :-)
References
- [Ahlberg 1994] C. Ahlberg, B. Shneiderman. Visual Information Seeking:
Tight Coupling of Dynamic Query Filters with Starfield Displays, Proc.
ACM Conference on Human Factors in Software, CHI '94, Boston, April 1994.
ACM Press. pp.313-317
- [Benford 1994] S. Benford and J. Mariani. Virtual environments for data
sharing and visualisation populated information terrains. Proc. IDS'94:
The 2nd International Workshop on User Interfaces to Databases. Lancaster,
UK, April 1994. Springer Verlag: Workshops in Computer Science. pp.168182
- Benyon 1992] D. Benyon. Task analysis and system design: the discipline
of data. Interacting with Computers. 4(1):246249, 1992
- [Brodbeck 1997] D. Brodbeck, M. Chalmers, A. Lunzer, P. Cotture. Domesticating
Bead: Adapting an Information Visualization System to a Financial Institution,
Proc. IEEE Information Visualization 97, Phoenix, October 1997. pp.73-80
- [Bruijn 2000] O. de Bruijn and R. Spence. Rapid serial presentation: a spacetime trade-off in information presentation. Proc. Advanced Visual Interfaces - AVI2000, ACM Press, 2000, pp.51-60
- [CDMA 2000] CDMA Develpment Group. What is CDMA (Code Division Multiple
Access)? (accessed 17th Nov 2001, dated © 2000) http://www.cdg.org/tech/about_cdma.asp
- [Chalmers 1999] M. Chalmers. Informatics, Architecture and Language.
Social Navigation in Information Space, A. Munro, K. Hook & D. Benyon (eds.),
Springer, 1999. http://www.dcs.gla.ac.uk/~matthew/papers/socnav.pdf
- [Chaudhuri 1998] S. Chaudhuri, R. Motwani, V. Narasayya. Random Sampling
for Histogram Construction: How much is enough? Proc. ACM SIGMOD98, Seattle,
1998
- [Dix 1992] A. Dix. Human issues in the use of pattern recognition techniques.
Neural Networks and Pattern Recognition in Human Computer Interaction, 1992.
Eds. R. Beale and J. Finlay. Ellis Horwood. pp.429-451 http://www.hcibook.com/alan/papers/neuro92/
- [Dix 1994] A. Dix and A. Patrick. Query By Browsing. Proc. IDS'94:
The 2nd International Workshop on User Interfaces to Databases, Lancaster,
UK, April 1994. Springer Verlag: Workshops in Computer Science. pp.236-248
http://www.hcibook.com/alan/papers/QbB-IDS94/
- [Dix 1996] A. Dix. Time, space and interaction. Proc. FADIVA 3,
Ed. I. Catarci. Gubbio, University of Rome, Italy, 1996. pp.99-103. http://www.hcibook.com/alan/papers/FADIVA/
- [Dugelay 2001] Jean-Luc Dugelay. Digital watermarking (tutorial).
SAICSIT 2001, South African Institute of Computer Scientists and Information
Technologists Annual Conference, University of South Africa, Pretoria, September
2001. pp.25-28
- [Ellis 1994] G.P. Ellis, J.E. Finlay, A.S. Pollitt. HIBROWSE for Hotels:
bridging the gap between user and system views of a database Proc. IDS'94
2nd International Workshop on User Interfaces to Databases, Lancaster, UK,
April 1994. Springer Verlag: Workshops in Computer Science, pp.45-58
- [Furnas 1986] G. W. Furnas. Generalized Fisheye Views, Proc. ACM
CHI '86, Boston, April 1986. ACM Press. pp.16-23
- [Gheisra 1998] A. Dix. Statistics tutorial: Gheisra a story. meandeviation.com,
1998. http://www.meandeviation.com/tutorials/stats/Gheisra/
- [Guthrie 1989] D. Guthrie. Statistical models and analysis on auditing,
panel on nonstandard mixture of distributions, Statistical Science 4,
pp.2-33
- [Hendley 1995] R.J. Hendley, N.S. Drew, A.M. Wood and R. Beale. Narcissus:
visualizing information. Proc. IEEE Information Vizualization'95. IEEE
1995. pp.90-96,146
- [Keim 1994] D. Keim and H-P. Kreigal. VisDB: database exploration using
multidimensional visualization. IEEE Computer Graphics and Applications,
September 1994. pp.4049
- [Keim 1999] D. A. Keim and A. Herrmann. The Gridfit Algorithm: An Efficient
and Effective Approach to Visualizing Large Amounts of Spatial Data, Proc.
Visualization '98, Research Triangle Park, NC, 1998, pp.181-188, 531
- [Kohonen 1990] T. Kohonen. The self-organizing map. Proceedings
of the IEEE, 78(9):14641480, 1990
- [Kreuseler 1999] M. Kreuseler, H. Schumann. Information visualization
using a new Focus+Context Technique in combination with dynamic clustering
of information space. Proc. NPIV'99 (New Paradigms in Information Visualization
and Manipulation), Missouri, Nov. 1999, pp.1-5
- [Lamping 1996] J. Lamping and R. Rao. Visualizing Large Trees Using
the Hyperbolic Browser, Proc. ACM CHI '96, Vancouver, April 1996. ACM
Press. pp.388-389
- [Lin 1992] X. Lin. Visualization for the document space. Proc. IEEE
Visualisation'92. IEEE, 1992. pp.274281
- [Lin 1997] X. Lin. Map displays for information retrieval. Journal
of the American Society for Information Science, 48(1):4054, 1997
- [Manku 1999] G. S. Manku, S. Rajagopalan. and B. G. Lindsay Random sampling
techniques for space efficient online computation of order statistics of large
datasets. Proc. SIGMOD’99 Int'l Conf. on Management of Data, Philadephia,
May 1999, ACM Press, pp.251-262
- [Netmap 2001] NetMap Link Analysis: making the invisible, visible.
2001
http://www.netmap.com/ > Presentations
> Link Analysis
- [Olken 1986] F. Olken, D. Rotem. Random Sampling from Relational Databases.
Proc. VLDB'86 Twelfth International Conference on Very Large Data Bases, August
1986, Kyoto, Japan, Morgan Kaufmann, pp.160--169
- [Olken 1993] F. Olken. Random Sampling from Databases. Ph.D. dissertation,
UC Berkeley, April 1993, LBL Technical Report 32883
- [Piatetsky-Shapiro 1984] G. Piatetsky-Shapiro, C. Connell. Accurate
estimation of the number of tuples satisfying a condition. Proc. SIGMOD’84,
Boston, June 1984. ACM Press. pp.256276
- [Pirolli 1996] P. Pirolli, P. Schank, M. Hearst, C. Diehl. Scatter/
Gather browsing communicates the topic structure of a very large text collection,
Proc. CHI'96, Vancouver, May 1996, ACM Press, pp.213220
- [Pirolli 1997] P. Pirolli. Computational Models of Information Scent-Following
in a very Large Browsable Text Collection. Proc. Conference on Human Factors
in Computing Systems, CHI'97, Atlanta, March 1997, ACM Press, pp.3-10
- [QbB 2001] A. Dix. Query-by-Browsing on the Web. meandeviation.com,
2001. http://www.meandeviation.com/qbb/qbb.php
- [Raman 1998] R. Raman. Random Sampling Techniques in Parallel Computation.
IPPS/SPDP Workshops 1998, pp.351360
- [Rao and Card 1994] R. Rao, S. Card. The Table Lens: Merging graphical
and symbolic representations in an interactive focus + context visualization
for tabular information, Proc. CHI'94, Boston, ACM Press, 1994, pp.111117
- [Salton 1994] G. Salton, J. Allan, C. Buckley and A. Singhal. Automatic
analysis, theme generation and summarization of machine-readable texts.
Science, 264:1411426, 1994
- [Schneier 1996] B. Schneier. Applied Cryptography second edition.
Wiley, 1996
- [Shneiderman 1998] B. Shneiderman. Designing the User Interface,
Third Edition. Addison-Wesley, 1998
- [Skalak 1994] D. B. Skalak. Prototype and feature selection by sampling
and random mutation hill climbing algorithms. In Proceedings of the Eleventh
International Machine Learning Conference. New Brunswick, NJ: Morgan Kaufmann,
1994, pp. 293--301
- [Spence 2001] R. Spence. Information Visualisation. Addison-Wesley,
2001
- [Tweedie 1994] L. Tweedie, R. Spence, D. Williams, R. Bhogal. The Attribute
Explorer Video proceedings CHI'94. ACM Press, 1994
- [Tweedie 1995] L. Tweedie, R. Spence, H. Dawkes and H. Su. The Influence
Explorer. Companion Proceedings CHI'95. ACM Press, 1995, pp.129-130
- [Tweedie 1996] L. Tweedie, R. Spence, H. Dawkes and H. Su. Externalizing
abstract mathematical models. Proc. CHI'96. ACM Press, 1996, pp.406412
- [Tweedie 1997] L. Tweedie. Characterizing interactive externalizations.
Proc. CHI'97. ACM Press, 1997, pp.375382
- [Williamson] C. Williamson, B. Shneiderman. The Dynamic HomeFinder:
Evaluating dynamic queries in a real-estate information exploration system,
Proc. SIGIR’92, ACM Press, pp.339-346
- [Woodruff 1998a] A. Woodruff, J. Landay and M. Stonebraker. Constant
Information Density in Zoomable Interfaces. Advanced Visual Interfaces
'98, L'Aquila, Italy, pp.57-65
- [Woodruff 1998b] A. Woodruff, J. Landay and M. Stonebraker. Constant
Density Visualizations of Non-Uniform Distributions of Data. Proc. UIST'98,
San Francisco, 1998, pp.19-28
http://www.hcibook.com/alan/papers/avi2002/ |
Alan Dix 31/1/2002 |