At the right time: |
||||
Alan Dix |
Jason Marshall |
|||
In Proceedings of Human Computer Interaction, International 2003. |
|
keywords: web history, favourites, bookmarks, navigation
This paper describes an empirical study of the effects of different sorting times for web bookmarks and history: incrementally 'during' browsing or all together 'after' browsing.
Studies by Tauscher and Greenberg have shown considerable revisiting of the same web page during browsing. Some of this is clearly due to backing up after following mistaken links or hub-and-spoke behaviour; indeed Catledge and Pitkow found that 30% of all navigation is the use of the 'back' button. However, there are a considerable residual number of pages that are 'really' revisited because the user wants to see the content again. Browsers support this behaviour both for short-term revisitation (back button and visit stack) and for the long term (history, bookmarks, favourites). In a formal analysis of several hypertext and web browsers Dix and Mancini found that the history and back mechanisms were subtly different in them all. This emphasises results found in other studies that users find back and history confusing and this is reflected in behaviour with comparatively little use of history or multi-step back. Bookmarks are more heavily used, but still are known to have many problems. There have been some more radical interfaces proposed and used at an experimental level including the data mountain, which allows users to arrange thumbnails of bookmarked pages in a 2D landscape, and Kaasten and Greenberg's interface unifying history and bookmarks.
In order to understand these issues further we conducted a series of exploratory interviews aimed initially at visualisation requirements for web history. However, in the course of this an issue which repeatedly arose was the interviewees' desire to be able to classify bookmarks/favourites at the moment they were 'remembered' rather than as a secondary exercise. Currently all common browsers force a bookmark-now, sort-later mode of working. The strength of the interviewees' reactions led to a refocusing of our empirical studies towards understanding this 'when to classify' issue in more depth.
In the full paper we report in detail results of experiments looking at sorting during browsing compared with sorting afterwards and its effect on later (online) recall. Although the interviewees had expressed a desire for 'during' sorting, we postulated that this would in fact lead to less clear classification. This is because when items are sorted during the process of browsing it is not clear the full range of future pages that will require classification, whereas when classifying pages after navigation it is easier to produce a balance and sensible classification. For example, if the first few pages seen are about aspects of football should one classify them as 'football' pages or 'sport' pages. If the following pages include one further page on golf and many on completely different topics then 'sport' would have been the best classification. However, if the rest of the pages were about golf, rugby, cricket etc., then 'football' would have been best. Classification 'after' is able to take this into account. The quality of participants' classification was judged by asking them to use the classified bookmarks to answer a series of questions. Our hypothesis was indeed borne out by the results which did show significantly better recall for the 'after' condition compared with 'during' sorting. Also in a post-test questionnaire the participants preferred the 'after' sorting, in direct contrast to the interviewees' imagined preference. Other results included a correlation between time spent sorting and performance.
A small number of participants were retested a week later. The number was very small (only four participants) and so any results are merely suggestive. However it did appear that the advantage of 'during' vs. 'after' sorting disappeared almost completely. If the quality of the 'during' classification were indeed worse then one would have expected to have had even worse results on retesting after the immediate memory of the classification process had faded.
As the numbers were too small for statistical testing these counter-intuitive results may well be just a random effect. However, they have made us question whether the strength of the 'after' sorting may be partly explained by the fact that the sorting process occurs closer to the post-test. More sophisticated experiments may be required to separate all the potential causes.
There are two main lessons from this last experiment. First the interviewees were able to articulate desires relating to meta-knowledge issues the timing of bookmark classification. Second, the actual running of the experiment also showed how complex these issues are, especially because we are looking at relatively long-term effects that are hard to capture fully within a laboratory setting. We deliberately chose an experimental setup that was at least partially ecologically valid rather than a more controlled and specific pure psychological experiment. This allowed us to find some real and strong effects, but by its nature admits multiple interpretations.