8 - The state of screen reading today

8.1 - Research into reading on screen

Now that we understand how the printed book works, we can turn at last to the computer screen and compare it with print. An understanding of the OSPREY principle, and how books and other printed documents use it in order to capture and hold our attention, reveals just how poor reading from the screen is by comparison. However, this understanding also reveals methods for dramatically improving screen displays – to the point where screen reading can become widely accepted.

Before we move forward, though, we need to look back at the research so far, and the conclusions of the researchers.

A great deal of research has been done over the past 20 or so years into reading on the screen. It began in the days of primitive computer displays – known as VDUs or visual display units. Early examples offered only the most primitive display of characters on the screen; crude character shapes, flicker, and typically green or amber text on a black background.

It was clear that such screens were completely unsuitable for protracted use. Operators complained of eyestrain; health agencies, labor unions and others in many countries were successful in introducing mandatory limits on continuous working. Many early research studies looked at the readability of such screens and found them wanting – not surprisingly.

Screen displays evolved dramatically, especially with the introduction of Graphical User Interfaces (GUIs) and computer graphics cards capable of displaying higher resolutions.

Computer displays – especially in software designed for document creation – have tried as much as possible to emulate paper, with black text on a “white” background.

Much of the research into readability on the screen has been overtaken by the rapid development of GUIs, as have standards for screen readability drawn up by organizations such as the European Commission. For example, a screen readability standard that is about to become a legal requirement in Europe, the ISO 9000 standard, is clearly based on pre-GUI days. For instance, the standard specifies that screen display of characters should ensure that no two characters are presented so close together that they touch.

This type of requirement made sense in the days of low-resolution displays. However, as screen graphics have evolved and resolutions improved such standards are increasingly out of touch. Two important tools of the typographer’s trade are ligatures (pairs or even triple-letter combinations) which not only touch, but merge, and kerning, in which certain pairs of letters are moved more closely together (perhaps even touching) in order to improve the optically-perceived spacing and make words more readable.

When resolutions and graphics capability were low, this kind of functionality was impractical on the screen. As both improve, they have become not just desirable, but essential, for readability.

Research into readability on the screen needs to be viewed against the background of rapid development, in both software and hardware.

For example, almost all of the studies carried out so far have used “traditional” computer displays employing Cathode Ray Tubes (CRTs).

CRT characteristics are fundamentally different from paper. They have inherent flicker which is perceptible at low refresh rates, and the solution so far has been to increase refresh rates to a level at which flicker becomes “imperceptible”. Whether this has truly solved the problem, or whether it remains as a factor in screen reading fatigue that continues to act at a more subtle level, remains to be seen.

CRTs are capable of producing very bright displays that if not carefully controlled can add an unacceptable level of glare.

Over the past few years, the growth in market share of portable computers – laptops, notebooks, handheld devices and so on – has led to rapid evolution of flat-panel display technology.

Early displays were monochrome, and of relatively low resolution. But Liquid Crystal Display (LCD) screens, as well as more esoteric technologies such as plasma displays, have rapidly improved.

Early LCD displays suffered from poor contrast, and in portable machines where power requirements have to be kept low to prolong working time on batteries this remains an issue. However, battery technology also continues to evolve, new technologies in on-glass circuity, and new screen materials such as polysilicon which are beginning to ship in devices with screen resolutions of 200dpi or more, are greatly improving screen aperture, improving screen brightness and lowering backlighting power requirements.

None of the screen readability research – apart from our own Microsoft work – takes into account the major leap forward in displaying text on LCD screens provided by the ClearType™ technology.

8.2 - Optimization of Reading of Continuous Text

Paul Muter at the University of Toronto, in his paper “Interface Design and Optimization of Reading of Continuous Text” (© 1996, Ablex Publishing Corp) suggests that we do not yet know how to optimize reading via electronic equipment, but goes on to suggest that many of the factors which affect readability of print also apply, for example:

  • Upper-case print, italics and right justification by inserting blanks result in slower reader.
  • Black text on a white background is read faster than the reverse, and most readers prefer it.
  • There is no effect of margins, serifs or typeface in general, within reasonable limits.
  • Effects of type size, line length and interline spacing interact.

Muter’s conclusions on right justification reflect a well-understood typographic issue which has been addressed earlier in this paper, i.e. poor justification is worse than no justification at all. Simplistic justification, which merely introduces additional spaces between words, disrupts the reading gait by breaking the pattern of more-or-less fixed space between words. Fully-justified text – which requires hyphenation to ensure wordspacing is constant – is standard practice in publishing. It is only in recent years that this has been implemented in “standard” applications such as word-processing; previously it was implemented only in “professional” desktop publishing applications.

The key term in his conclusions on margins, serifs or typeface in general is “within reasonable limits”. This is shorthand for “within limits already established in text production”, i.e. by hundreds of years of publishing usage.

Many of the key factors identified by Muter and others in differences between print and screen that could account for observed slower reading from the computer screens of the 1980s are addressed by Microsoft’s ClearType technology. These are:

  • Resolution
  • Edge sharpness
  • Character shape
  • Stroke width of characters
  • Actual size of characters
  • Characters per line
  • Lines per page
  • Words per page
  • Inter-line spacing

The benefits of ClearType are not only in character shape, edge sharpness, resolution, and stroke width. Because ClearType makes it possible to produce excellent character shapes at “small” sizes – the sizes people normally read in print – the technology also solves the problems of character size and thus number of characters per line, lines per page, words per page and inter-line spacing.

Other factors mentioned by Muter, such as the effect of margins, method for text advancement, and so on, are addressed by the OSPREY reading engine.

Again, Muter reaches the same conclusion as regards screen readability that Tinker reached for print: “It is quite clear that no single variable accounts for the obtained differences in performance between CRTs and paper. Several of the … variables, including resolution, interline spacing, polarity and edge sharpness contribute to the effect”.

“With a more modern system, including a large, higher-resolution screen with dark characters on a light background, reading from a computer can be as efficient as reading from a book.”

Muter & Maurutto: 1991

Here Muter contradicts himself slightly: if line length is important (which he agrees it is) then specifying a “large, high-resolution screen” is a mistake. The higher resolution is the benefit, NOT the size. Of course, at the time of writing, and since Muter was comparing print with CRTs, high-resolution screens were only available in large sizes. This probably explains the slip.

Muter’s work also explored the issues of color and flicker, which relate to phosphor patterns on CRT screens and refresh rates, neither of which is as relevant on LCD displays.

Muter also re-iterates previous findings that paging is superior to scrolling in terms of both performance and user preference.

Muter does quote one finding (Nas, 1988) which suggests reading is slower if words are hyphenated at the ends of lines. It is likely that such a disadvantage, if it exists, is outweighed by the cueing benefits of justification AND constant word-spacing. These two requirements are mutually-exclusive, unless words are hyphenated.

Muter also examines various dynamic text presentation systems such as Rapid Serial Visual Presentation, in which single words are flashed on the screen in rapid succession.

His conclusion, however, is that despite the large number of published experiments on reading text from computers, no system has been found which is more efficient, in terms of both speed and comprehension, than the book.

Muter suggests that a likely reason for this is that the “bottleneck” is in the processing in the human brain, and that the technology of the book is optimal, having evolved over several centuries.

8.3 - Paper versus screen

Reading from paper versus screens: a critical review of the empirical literature (Dillon, 1992).

This is an excellent review of a great deal of the work done up to 1992, which was dominated by work on overcoming speed deficits resulting from poor image quality. Dillon highlights the fact that emerging literature revealed a more complex set of variables at work. His review considered the differences between paper and screen in terms of outcome and processes of reading and concluded that single-variable explanations failed to capture the range of issues involved. He pointed to existing research, which was dominated by problems found when reading from VDUs (generally, green or white text on a black background). Testing methodologies, experiment design, and subject selection were frequently flawed.

By far the most common finding, he said, was that reading from the screen is between 20 and 30 percent slower than from paper.

Some of the test methodologies used by researchers are almost unbelievable, and completely unrelated to the normal reading experience. For example, in one study by Muter et al in 1982, subjects were asked to read white characters 1cm high, displayed on a blue screen, at a reading distance of 5 meters – in a “well-illuminated room”.

In this study, it also took nine seconds to repaint each screenful of information.

Other studies carried out at that time used similarly skewed test methodologies, for example, comparing printed text with characters 4mm high with green text 3mm high on a black screen background. Given Tinker and Paterson’s work on legibility in print, it was hardly surprising that researchers found screen reading slower and less acceptable.

Most of these studies were carried out in the 1980s using older displays (then referred to as Visual Display Units – presumably for the Visual Display Operatives who would run them). (Heppner, Anderson & Farstrup, 1985)

Later studies using computers with GUIs and thus text which more closely approached print parameters showed there was in fact little or no difference between screen and print, provided that attention was paid to such factors as screen resolution, refresh rates, anti-aliasing, text polarity, etc..

Various researchers found that paging text rather than scrolling was much more acceptable. (Schwarz, Beldie & Pastoor, 1983)

In a study carried out at Hewlett-Packard Labs in California (SID 95 Digest, 1995), E.R. Anthony and J.E. Farrell used a 1200 dpi, 24-bit color screen to simulate printed output, and found that users found no difference between the two, suggesting that screen resolution was, in fact, the major issue, and that when there was sufficient resolution then existing parameters for achieving legibility in print can be applied to the screen.

Reading performance using screen and paper was directly compared in another study (Osborne and Holton, 1988), which examined the argument that reading from the screen was slower. These researchers paid closer attention to experimental detail, comparing light characters on a dark background and dark characters on a light background on both paper and screen.

They found no significant difference, although readers expressed a clear preference for the “normal” presentation – dark characters on a light background – for both screen and paper.

Key factors:

  • Smaller amount of information on screen than paper
  • Multiple factors involved in readability
  • Legibility factors for paper apply to the screen
  • High contrast is better
  • Non-scrolled text is better than scrolled text
  • Inexperienced users prefer paging

Another study (Gould, Alfaro, Finn, Haupt and Minuto, 1987) reached the same conclusion, finding that a combination of dark characters on a light background, removing jaggedness from screen fonts (in their case using anti-aliasing) and using a high-resolution monitor (in their case, 1000x800) leveled the playing-field between screen and paper in terms of reading speed.

Some researchers in the past have suggested that one way of improving screen readability is simply to give users a larger screen. (de Bruijn, de Mul & van Oostendorp, 1992)

This study seems fatally flawed. Virtually all researchers conclude that a myriad of factors, including resolution, refresh rate and character size are involved in producing readable text on screen. Yet this study uses standard and large-size screens with differences in all three of these variables, but bases its conclusions only on the different screen sizes. The researchers argue that these factors can only have had a minimal effect (contradicting the vast body of research to the contrary) and dismiss this by saying further research is required to exclude these “possibly confounding effects”.

Other studies (Richardson, Dillon and McKnight, 1989) (Duchnicky & Kolers, 1983) indicate that screen size is not a major factor in reading performance – although readers expressed preference for larger screens.

Other researchers attempt to bypass the screen issue altogether by developing new technologies such as “electronic paper”, and even inks made from “mutant bacteria” (a strain of bacteriorhodopsin) which change color based on response to electrical charge.

While these technologies may prove a future substitute for paper, so far none of the R&D efforts have succeeded in shipping a useable product. Some researchers have suggested that electronic paper that responds to a change in charge could be used to build “electronic books” by binding several hundred sheets of this material together.

This begs the question: If a single sheet can change into any page, why not use just one – and isn’t that (assuming it refreshes fast enough) just a screen by any other name? The “page” still requires electronics to drive it.

It is often dangerous to dismiss nascent technologies. But the failure so far of any of these groups to ship a practical implementation suggests that it is safe to do so at least until they are proved to work.

(Kolers, Duchnicky and Ferguson, 1981) found readers preferred more characters per line rather than larger type sizes, and that static pages were processed more efficiently than pages scrolled at the reader’s preferred rate. Scrolling faster than the preferred rate gave readers better reading efficiency, but created problems of user acceptance.

In one of the few tests that used extended reading times, subjects read printed books and continuous text on screen for two hours. There was no significant difference in either reading performance via comprehension scores, nor were there differences evident in eyestrain, dizziness or fatigue.

However, reading from the screen was found to be 28.5 percent slower.

This test found no difference in presence of proportional word spacing and non-proportional wordspacing – hardly surprising, as the examples of both were displayed on videotext montiors that looked uniformly awful. Viewing the sample screens in the paper is enough to explain the slower reading performance; it is surprising any of the screen subjects actually made it to the end of the two-hour test.

(Jorna & Snyder, 1991) found that if image quality (i.e. resolution) of print and screen were equal, they would yield equivalent reading speeds.

(Gould, Alfaro, Barnes, Finn, Grischkowsky and Minuto, 1987) tried to explain causative factors in their earlier findings on screen reading being slower than reading print. They tried to isolate single variables to explain the difference, and concluded that the difference was due to a combination of variables such as display orientation, character size, font or polarity, probably centering on the image quality of the characters themselves.

(Trollip and Sales, 1986) compared unjustified text with “fill-justified” text, i.e. text justified by inserting extra spaces between words. Subjects were asked to read printed samples. They found fill-justified text was read more slowly. This evidence supports assertions by typographers that irregular word-spacing from line to line interrupts the reading flow. If text is justified, it must be combined with hyphenation in order to keep word-spacing constant.

(Gould and Grischkowsky, 1986) examined the effect of visual angle of a line of characters on reading speed. The experiment found that proofreading and accuracy were reduced at extreme visual angles, however, text displayed on most normal computer screens did not produced such extreme angles and thus visual angle had no effect.

The study used two different typefaces, the 3277 CRT character set (characteristic of VDU displays at that time), and Letter Gothic. The CRT characters were green on a black background, Letter Gothic was black on white. It was found proofreading performance was significantly poorer and slower with the CRT characters.

The experiment varied the size of characters as visual angle (line length) was varied, and the researchers concluded that line length and character size were inter-related variables which contributed to readability in an interactive way.

(Waern and Rollenhagen, 1983) analyzed the task of reading text from the screen. As with many of these “early” studies (ten years is a very long time in the computer world), much of the data is irrelevant, since “traditional” VDU displays have been superceded by Graphical User Interfaces which display text more closely resembling paper (though still a long way off).

The listed the following parameters as affecting human vision, citing earlier research (Stewart, 1979):

  • Character size
  • Character shape
  • Inter-character spacing
  • Stability (flicker, shimmer, jitter and swim)
  • Resolution
  • Luminance
  • Contrast
  • Chromaticity

The researchers appear to have ignored other factors – line length, page size, justification etc., all of which were investigated and found to be important variables by the earlier work of researchers like Tinker and Paterson in their classic studies of readability and legibility of print.

8.4 - Innovative approaches

Innovative approaches to reading from the screen have been tried. Scrolling at a fixed pace, at a user-preferred pace, and at faster or slower than the user-preferred pace have all been tested. In all cases, inexperienced users preferred paging to scrolling.

Another innovative approach is referred to as Rapid Serial Visual Presentation; generally this means flashing single words on the screen at extremely high speeds. In some implementations, speed is gradually accelerated.

Grandiose claims have been made for this technology, in all cases by companies attempting to sell RSVP tools for authoring and reading. None have so far met with commercial success. Claims of reading speeds of 750 words per minute and higher have been made; one company has even claimed that RSVP induces a “trance-like state” in which readers actually see pictures associated with the text.

The “trance-like state” is clearly a nod to the work of Victor Nell.

Researchers have tried to seriously investigate some of these claims.

(Konrad, Kramer, Watson and Weber, 1996) investigated the use of RSVP for dynamic information. They found that RSVP had better performance for clause-level units only if the author had already carried out “chunking” into units of meaning. In other words, RSVP efficiency depended on semantic analysis of content and editing into units of meaning – which would rule it out as a general-purpose panacea to improve readability of text on the computer screen. RSVP worked better on clause-level presentation than word-level presentation. The work also suggests that when objects (words) are viewed in rapid sequence, attention may be diverted or overloaded.

Although this is purely anecdotal, I personally tried all of the implementations of RSVP I could find on the Web, and came to similar conclusions.

Because words differ in length, the “recognition field” varies continuously. This made it tiring to continually refocus on each word.

As presentation speed increased, I had the sensation of gradually being left behind. As an extremely fast reader it was disconcerting to have to stop the display to try to catch up.

Since words were flashed up one at a time, my brain appeared to be working hard at “chunking” units of meaning – having to hold words in some temporary buffer until the full “chunk” had been built, at which time it could be released (unfortunately, this extra cognitive load normally meant I was running to try to catch up with the text, which had continued to change.

This personal experience reinforced the findings of the paper cited above.

For readers who wish to explore the claims for RSVP in more detail, here are some of the references:

The Art Of Legibility (Tenax Software, 1997). A sales job for the company’s MARS and Vortex RSVP software. Plenty of references to other RSVP work.
Colorado Entrepreneur sells software that improves literacy (Knight-Ridder Business News, 26/12/97)
AceReader helps you read web pages, email and documents faster! (Stepware, Inc.)
Super speed-reading (Forbes, 08/11/97)

A number of less-exotic technologies have been introduced to try to improve screen readability. The most common of these is the use of anti-aliasing (aka grayscaling when applied to black-and-white text and background).

(O’Regan, Bismuth, Hersch and Pappas) published a paper on “perceptually-tuned” grayscale fonts, which claimed that their grayscaling technique improves legibility of type at small (8 and 10 point) sizes.

Professor Hersch, of the Ecole Polytechnique in Lausanne, is a world expert in digital typography, publisher of numerous books and papers on the subject.

It is clear that the grayscaling techniques employed do improve legibility at these vital sizes (vital, because these are the sizes at which most people prefer – perhaps even need – to read large bodies of text).

However, it is also clear that the improvement is not sufficient to make people comfortable reading for extended periods on the screen. The problem with grayscaling is that it blurs the text in order to smooth out its jaggedness, but it does do by using the same-sized pixels which caused the problem in the first place.

Type has extremely small features that work to enhance its legibility. Trying to portray such small features with the traditional pixel is akin to being asked to paint the Mona Lisa, then being handed a paint-roller.

Grayscaling unfortunately still uses the same size paint-roller. It’s just that along with the black paint we had, we now get a few extra buckets of gray with which to smear the edges.

Especially at small sizes, text looks blurred. It is fatiguing to read for extended periods, since the eye is continually trying to focus the inherently unfocusable.

Another approach is to try to design fonts specifically for reading on the screen, in effect adapting type to the constraints of the pixel.

At Microsoft, we have spent more time, expertise and money on this issue than anyone.

We commissioned Matthew Carter, one of the world’s leading type designers, to create two completely new typefaces for Microsoft – Verdana, a sans serif face, and Georgia, a serif face.

These two faces have shipped with every version of Microsoft Internet Explorer and Microsoft Windows since 1997. They have thus been in constant use by millions of people daily. They have been hailed as great standards for the Web. They have been made available for free download from Microsoft’s website.

And they are not good enough, until they were freed by ClearType from the constraints of the pixel.

A research study Microsoft commissioned from Carnegie Mellon University (Boyarski, Neuwirth, Forlizzi, and Regli, 1997), found that Georgia was indeed more readable than Times New Roman, a Windows core font designed for print, but which had been tuned for screen readability to a huge degree. The study found Verdana was even more readable than Georgia.

Anti-aliased versions of fonts were easier to read and more legible than non-anti-aliased versions.

The study also compared Microsoft’s anti-aliasing algorithms with those of Adobe Systems – and found that readers preferred the Adobe anti-aliasing – (although, thankfully, only to a very small degree!).

Microsoft also commissioned other research at the University of Reading (pronounced redding) in the UK, which has a large and international-renowned Typography and Graphics department. This study was on the effect of line length on readability. (Dyson and Kipping, 1996).

Accepted wisdom among graphics designers and typographers is that lines of 55-65 characters (at type sizes from 9-12 point) are most readable.

Dyson and Kipping found that longer lines of 100 characters were actually read more quickly, but that readers were less comfortable with these longer lines, preferring the more normal line lengths.

They also found that scrolling was slower than paging.

With typical academic reluctance to jump to conclusions, these researchers suggested that the difference between readers’ perceptions and actual performance means it is difficult to make practical recommendations on optimal line length.

Microsoft commissioned a follow-up study to investigate this in more detail (Dyson and Haselgrove, 1998), which had similar results.

At Microsoft – where we have spent a great deal of effort over many years trying to make people feel more comfortable using computers – we take a more pragmatic view. “Perception is everything.” Especially when applied to long-duration reading tasks such as books, comfort is far more important than performance – since we read books at our own pace in any event, and pacing is driven by content.

results matching ""

    No results matching ""