Simplification and Generalization of Large Scale Data for Roads: A Comparison of Two Filtering Algorithms

: This paper reports the results of an in-depth study which investigated two algorithms for line simplification and caricatural generalization (namely, those developed by Douglas and Peucker, and Visvalingam, respectively) in the context ofa wider program of research on scale-free mapping. The use of large-scale data for man-designed objects, such as roads, has led to a better understanding of the properties of these algorithms and of their value within the spectrum of scale-free mapping. The Douglas-Peucker algorithm is better at minimal simplification. The large-scale data for roads makes it apparent that Visvalingam's technique is not only capable of removing entire scale-related features, but that it does so in a manner which preserves the shape of retained features. This technique offers some prospects for the construction of scale-free databases since it offers some scope for achieving balanced generalizations of an entire map, consisting of several complex lines. The results also suggest that it may be easier to formulate concepts and strategies for automatic segmentation of in-line features using large-scale road data and Visval-ingam's algorithm. In addition, the abstraction of center lines may be facilitated by the inclusion of additional filtering rules with Visvalingam's algorithm.

Simplification and Generalization of Large Scale Data for Roads: A Comparison of Two Filtering Algorithms Mabes Visvalingam and Peter J. Williamson ABSTRACT: This paper reports the results of an in-depth study which investigated two algorithms for line simplification and caricatural generalization (namely, those developed by Douglas and Peucker, and Visvalingam, respectively) in the context ofa wider program of research on scale-free mapping.The use of large-scale data for man-designed objects, such as roads, has led to a better understanding of the properties of these algorithms and of their value within the spectrum of scale-free mapping.The Douglas-Peucker algorithm is better at minimal simplification.The largescale data for roads makes it apparent that Visvalingam's technique is not only capable of removing entire scale-related features, but that it does so in a manner which preserves the shape of retained features.This technique offers some prospects for the construction of scale-free databases since it offers some scope for achieving balanced generalizations of an entire map, consisting of several complex lines.The results also suggest that it may be easier to formulate concepts and strategies for automatic segmentation of in-line features using large-scale road data and Visvalingam's algorithm.In addition, the abstraction of center lines may be facilitated by the inclusion of additional filtering rules with Visvalingam's algorithm.KEYWORDS: fine simplification; line generalization; roads; large-scales T he main aim of this paper is to explore the value ofVisvalingam's algorithm, reported in Visvalingam and Whyatt (1993), within the context of scale-free mapping.The properties and behavior of this algorithm are compared with that of the well-known Douglas and Peucker algorithm (1973).like most other line simplification algorithms, Douglas and Peucker's algorithm (widely known as the Douglas-Peucker algorithm) may only be used for modest levels of generalization.In contrast, Visvalingam's scheme for elimination of points appears to be useful for filtering scale-related features and for producing acceptable caricatural generalizations of lines even at fairly gross levels of reduction.The original study by Visvalingam and Whyatt (1993) was based on coastline data, captured at the 1:50,000 scale, which were progressively generalized to a 1:10M scale; i.e., the study was concerned with relatively small-scale applications.
The comparison of these algorithms using road data, vector digitized from 1:1,250 large-scale plans, is undertaken here also within the broader problem of generating scale-free digital maps (or spatial databases).This requires the generalization and structuring of information for flexible use.Such scale-free digital maps will become the ultimate reference maps from which a variety of visual maps will be derived (Guptill and Starr 1984).Visvalingam (1990), in her model of the scope of digital cartography, distinguished the sub-fields of digital and visual mapping which each have a different focus: on digital maps, and visual maps, respectively.The present study is mainly concerned with the problems of information generalization within the sub-field of digital mapping.The generation of visual maps involves the use of additional generalization operators, which are outside the scope of this study.
Roads, being people-designed features, are relatively smooth compared with coastlines.Given the absence of spiky detail, it was assumed that the two algorithms would perform equally well and that only a small proportion of points could be omitted during minimal simplification; i.e., before the shape of the roads became noticeably altered.Visvalingarn and Whyatt (1993) identified some limitations of Visvalingam's method and indicated a need for further testing.The application of this algorithm to largescale data was undertaken to study its behavior and limits of applicability Given that roads are engineered to be straight or smoothly curving, it was assumed that there would be limited scope for data compression through point Cartography and Geographic Information Systems, Vol. 22, No.4, 1995, pp... <~:::' -<:::.:/.: :::>-.. \.\ \ :~;:> .

) 5
Note: There are 1610 points on the map of which 46 points are line ends.The selected line consists of 268 points, including the 2 end points.Percentages are of internal points only.
Table 1.Performance of two filtering algorithms at various levels of generalization.
lining of roads for use within geographic information systems (GIS).Segmentation will also facilitate the subsequent process of display generalization.The two algorithms are reviewed in the following section, which describes the data used in the study.Some of the results are then presented and their implications discussed.

Background
In this section, we briefly review and compare the algorithms designed by Douglas and Visvalingam, respectively.These two methods only are compared because they are academically superior to others in their abstraction of key principles and in their elegant formulations.Further, these two algorithms are easier to implement than some others, such as those described by Deveau (1985), Dettori andFalcidieno (1982), andBeard (1991).It is also easy to comprehend the cartographic implications of their geometric processing.In addition, the Douglas-Peucker algorithm remains the most widely used in digital cartography and 10 geographical information systems.Visvalingam and Whyatt (1990;1991a;1991b) provide a detailed background and evaluation of the Douglas-Peucker algorithm and some problems of implementation.Only the gist of the algorithm is provided here.
The Douglas-Peucker algorithm was initially developed as a weeding algorithm for removing spurious variations in digitized lines which fall within the width of the source graphic line.In pattern recognition, Ramer (1972) had independently presented a different formulation of this algorithm for deriving a minimal description of plane curves and had observed some of its limitations and some special cases (see Visvalingam and Whyatt 1991a).The method for weeding consists of joining the two ends of the line with a straight line, called the base line.The perpendicular distances of all intermediate points from this base line are then calculated.If all these distances are less than some pre-defined tolerance representing half the width of the graphic line at source scale, these points may be discarded and the original line can be represented by the base line.If any of the intermediate points fall outside the tolerance band, the line is split into two parts at the furthest point and the process is repeated on the two resulting parts.This method has been used for line generalization by selecting a tolerance value attuned to the target scale.It has also been used to rank points into a hierarchy that reflects their order of selection and their importance within this scheme.
Visvalingam and Whyatt (1990;1991b) explain why the use of the tolerance band for elimination of points is reasonable.However, the selection of the furthest point outside the band as a critical point to be retained is unreliable because this point may be located on spikes (errors) and on minor features.Manual line generalization is more concerned with preserving salient shapes than with selecting specific points.Indeed, it is very difficult to hand-pick precise points on natural features in a consistent way.Visvalingam's technique is based on the observation that it is easier to eliminate entire features.Unfortunately, the automatic identification of geomorphological features remains an outstanding problem in digital cartography.Although Visvalingam's technique filters points, it is conceptually oriented towards removing geometric features.
In brief, the algorithm makes multiple passes over the line.On each pass, it eliminates the point which it regards as least important.A variety of metrics may be used to measure the importance of points; Visvalingam and Whyatt (1993) tested the concept of effective area.This is the area of the triangular feature formed by connecting the point with its two neighbors: it measures the area by which the current line would be displaced as a result of removing that single point.When a point is removed the effective areas of adjacent points need to be recalculated before the next pass.The gist of the algorithm may be summarized by the following pseudocode: Note that all points are tagged with E and that their elimination sequence is recorded.The tagged points may then be filtered at runtime by interactive fine-tuning of the tolerance parameter for E. The theoretical ideas underpinning this algorithm, the special cases and implementation issues are discussed in detail by Visvalingam and Whyatt (1993).The Douglas-Peucker algorithm has been in use for some 20 years and has been widely studied.Visvalingam's technique is relatively new and there is a need for further testing.Whyatt (1991) obtained better results with Visvalingam's algorithm than with others by Douglas, Dettori andFalcidieno (1982), andRoberg (1985).Visvalingam and Whyatt (1993) demonstrated that Visvalingam's technique can filter out features within features progressively and thus achieve both minimal and caricatural generalization.The Douglas-Peucker method is only suitable for minimal simplification.Also, Visvalingam's technique uses fewer points when representing the same shapes.However, Visvalingam and Whyatt (1993) regard their findings as just one step towards a more intelligent system for line generalization and point out some limitations.
The evaluation of the two algorithms in this paper is based on a sample of roads from the 1:1,250 OS-BASE data created by the Ordnance Survey for experimental purposes.lbis database has now been superseded by other experimental products.The nature of the link-and-node structured OSBASE data and the complex process of extraction of roads were described by Varley and Visvalingam (1994).Data for complex road polygons in a 500 x 500 m area (shown in Figure 1) are used in this paper.The coordinates were digitized to 5-an precision.The data were pre-processed as follows.All the map edge links were removed from the road polygons.This segments the road polygon into 23 road sections.Some of these start and end at the map edge while others form closed loops describing the islands in roads.
The road boundary sections are thus anchored to the map edge so that the connectivity of the road network across map sheets could be maintained.Since the algorithms always retain at least the two end points on each section, the figures for percentage of points retained in Table 1 have been calculated after discounting these end points.Visvalingam and Whyatt (1993) pointed out that the inclusiOn/omission of a few points could markedly alter the shape of a line, particularly at gross levels of generalization.It was therefore inappropriate to select fixed percentages of points when comparing the two methods.Instead, an effort was made to find cut-off values which provided comparable numbers of points but which did not bias the shapes in favor of a particular method.

Observations
Detailed exploratory analysis revealed that the road boundaries encode various types of detail, namely: 1. small irregularities on the road which do not significantly alter the shape of lines at source scale, 2. tight curves at filletedjunctions and round-abouts, 3. minor features, e.g., lay-byes, metalled entrances to drives, roundabouts, 4. small branch roads, 5. major branch roads, and 6. large features, such as car parks and large islands in roads.The tolerance values listed in Table 1 were selected to demonstrate the performance of these two algorithms at filtering out these types of features.The effect of these various cut-offs are noted below.

Minimal Simplification
The term minimal simplification refers to the lowest level of generalization referred to by Jenks (1981).Minimal simplification selects a reduced set of points, for example by weeding, to denote the original line as faithfully as possible.The effects of weeding and minimal simplification were studied using the section of the line extending from A to B in Figure 1, which consists of 268 points (266 internal points) and maps drawn at larger-than-source scale.These maps did not show any noticeable displacement of the source line using a 1 mm (0.125 m on ground) tolerance.Both methods may be used to weed over 55% of internal points.Visvalingam and Whyatt (1993) were able to achieve good minimal simplification of coastlines with just 23% of points.However, with this data set it was possible to visually detect deviation from the  original line with even 35% of points; see Figure 2a, which was originally drawn at slightly larger than 1:1,250 scale.The original line is shown as a dotted line and the simplified line is shown as a solid line.The simplified line in Figure 2a is noticeably displaced from the original line.Figure I shows that the majority of points occur along curves.Visvalingam's technique is tending to cut curves, which are better preserved by the Douglas-Peucker algorithm.
The differences between the two methods become more noticeable in Figures 2b and 2c.Visvalingam's technique is cutting curves, but even the Douglas-Peucker algorithm is beginning to drop points on curves and, more importantly, to displace the course of the line.Figure 2d shows that the Douglas-Peucker algorithm is able to retain all the features in the map using less than 20% of internal points although some displacement of the lines is noticeable even at reduced scale.Visvalingam's technique is beginning to drop points on minor features.It appears that the Douglas-Peucker algorithm is better than Visvalingam's area-based method for minimal simplification.The Douglas-Peucker algorithm requires fewer points for minimal simplification of  large-scale data.This contradicts the conclusions drawn by Visvalingam and Whyatt (1993) in the context of small-scaledata.

Elimination of Minor Features
The tolerance values had to be selected more carefully for this level of generalization.Figure 3a(i) illustrates the best shape which could be achieved with the Douglas-Peucker method.There is noticeable displacement of the course of the lines.By comparison, Visvalingam's technique shows no such marked deviation from the original line except at curves.When the same tolerance is applied to the whole map (Figure 3b(ii)), minor features such as lay-byes,metalled entrances to roads, and very narrow islands in roads are removed.The curves at road junctions and the curved sections of road islands have become truncated.In contrast, the Douglas-Peucker algorithm (Figure 3b(i)) produces less satisfactory results.The curves and minor features are only partially retained, distorting the shapes of roads.The use of large-scale data reveals that Visvalingam's technique tends to short-cut curves, which was not noticed by Visvalingam and Whyatt (1993).

Elimination of Major Features
The Douglas-Peucker algorithm retains extreme points at the expense of overall shape with around 5% of points (Figures 4a and 4b main feature.This is preferable to the behavior of generalization routines in the Whirlpool program as illustrated by Beard (1991).In her maps, heads of esUlariesbecome detached into lakes.Here" roundabouts are likelyto become detached from roads.
Figure 4c shows that Visvalingam'stechnique can produce an appropriate shape using even merely two internal points.Figures 4d and 5 demonstrate the well-knownfact that the Douglas-Peucker algorithm cannot be used for caricaUlralgeneralization. Visvalingam and Whyatt (1993) observed that Visvalingam's algorithm has the effect of omitting scale-related features in their entirety.This study reveals that the shape of the retained feaUlres are fairly well preserved even at gross levels of generalization.

Discussion
Visvalingam and Whyatt (1990) noted that the most distant point from an anchor floater point selected by the Douglas-Peucker method has a tendency to fall on minor feaUlres.The designation of these extreme points as critical points and their retention inhibits the scope for achieving caricatural generalization.In contrast, Visvalingam's technique has the effect of progressively removing scale-related features at increasing levels of generalization even though the technique itself does not as yet have any intelligence to recognize features as such.The present study draws attention to aspects of the algorithm which were not obvious when using small-scale data.First, Visvalingam's technique initially eliminates points on smoothly varying Clirves since they are more closely spaced and result in small effective areas.A slightly offset or redundant intermediate point on a long straight section can still subtend a larger area.Thus, offset-based methods are better at weeding and minimal simplification.Visvalingam's method works on the part-generalized line and not the original line.As a result, the generalized line can become progressively displaced from the original line in smoothly curved sections resulting in noticeable truncation of filleted comers.
At first, this was seen as a defect.However, on reflection it was understood to be the most significant and useful property of the algorithm.That is to say, it was this same process of working relative to the current-rather than the original-line which enabled Visvalingam and Whyatt (1993) to produce appropriate scale-related caricatures of Carmarthen Bay: it is this process which results in the progressive elimination of scale-related features in roads and coastlines.Although the result gave the impression of an elimination of scale-related features, Visvalingham and Whyatt point out (1993,50) that they could not use the algorithm in its present form to automatically segment the coastline into its constituent features except when a number of adjoining points were eliminated together; for example, on the River Ouse and Spurn Head.It could be argued that even manual segmentation of lines describing natural coastlines is likely to be problematic and subjective.In contrast, the truncation of curves on peopledesigned artifacts makes them automatically detectable.
Secondly, this truncation of curves by Visvalingam's technique allows better strategies for both representing and generalizing the road to be adopted.We could continue to represent the truncated curve as a detached list of points and use the Nth point algorithm to select a scale-determined subset.Or, we could store the curve in compact parametric form and generate the appropriate munber of points.Moreover, the fact that the original line refers to a road enables us to add additional semantic labels to curves since tight curves only occur in particular contexts within roads; for example, at junctions and roundabouts.In the context of road center line calculations (whichwas one of the original reasons for this study) the detection of such a feature could be used to trigger the search for corresponding elements.Thus, the output of Visvalingam's technique is potentially more useful than that produced by the Douglas-Peuckeralgorithm.
Thirdly, Visvalingam's technique drops minor features, such as entrances to driveways, relatively early before distorting the distinctive shape of the road.The removal of these unwanted features again expedites center line calculations; i.e., more abstract structural generalization.The automatic detection and segmentation of sub-features will not only facilitate the automatic derivation of the road center line network, but it will also enable the addition of a rule-base to available attribute data to expedite the selection/omission of parts of the road boundary on application-defined criteria.The one-off construction of scale-free databases is thus facilitated by Visvalingam's algorithm.
Seeing is a constructive process and it places different constraints on the generalization of natural and cultural features.While even highly stylized caricatures of natural features, such as coastlines, are acceptable, we tend to reject shapes which do not coincide with our expectations of designed artifacts, such as roads.The shapes output by the Douglas-Peucker algorithm do not correspond to our expectations of roads at gross levels of generalization (see Figure 5i).We tend to expect the shapes of such features to correspond to their transport functions.Even at fairly gross levels of generalization, the shapes output by Visvalingam's technique, though not acceptable in printed maps, would be recognizable as roads when zooming out of ephemeral interactive displays.
Finally, in Figure 5, the map generalized using Visvalingam's algorithm has become unbalanced and some roads have acquired a funnel shape.However, these problems arise partly as a result of edge-effects.Normally, such extreme generalization would only be undertaken when this data forms a part of data for a much larger area.When part generalized data for adjacent sheets are seamed together, it is highly unlikely that the end points of lines falling on the map edge will be retained.The automatic removal of these points on seaming will correct the funnel shapes.Also, some of the smaller feeder roads, such as A in Figure 5, are likely to be dropped while small blocks of intervening land, such as C in Figure 1, are likely to be retained or correctly processed when seamed with adjoining sheets.Figure 3b(ii) shows another problem.One narrow island in the road (D in Figure 1) was eliminated before the others.This could pose a problem in center line calculation.However, the metrics and rules driving Visvalingam's technique can be adapted to ensure that all the necessary elements are retained for center line calculation.For example, a rule that each polygon must be represented by at least two internal points would overcome this problem.

Summary and Conclusion
Our understanding of the potential role of pointfiltering algorithms within cartographic generalization has been extended by this study.Most point-filtering algorithms may only be used for achieving modest levels of generalization, largely through line simplification.Both the Douglas-Peucker and Visvalingam's algorithms need only about 40% of the original points for minimal simplification, I.e., before shape distortions become noticeable.The Douglas-Peucker algorithm is superior to Visvalingam's method when it comes to minimal simplification of roads on 1:1,250 maps using a tolerance of less than one to two meters.Further elimination of points produces effects which are noticeable even on reduced displays; the retention of extreme points by the Douglas-Peucker method causes shape distortion.
Visvalingam's algorithm, which was conceived to filter features, opens up opportunities for research into other aspects of information generalization.As it stands, it filters only triangular geometric features.However, the results suggest that the algorithm could be extended to identify, encode and eliminate scale-related features.It is possible to achieve a balanced generalization of an entire map consisting of several lines using a single tolerance with Visvalingam's algorithm.The Douglas-Peucker algorithm cannot be used with a single tolerance to generalize satisfactorily even a single complex line, such as AB in Figure 1 (see discussion in Visvalingam and Whyatt 1990).
Visvalingam and Whyatt had previously noted (1993, 50) that the progressive elimination of smaller features, such as rivers feeding into estuaries, provides some scope for segmentation of features for intelligent.generalization.Once segmented, the rivers can be subjected to minimal simplification, or they may be collapsed into their center lines or be tagged by users for knowledgebased filtering as suited to the scale and purpose of the map.The same ideas are applicable in the context of large-scale data.When viewed within such a range of generalization possibilities, McMaster's (1987) mathematical criteria (for example areal and vector displacement relative to the original line) for comparing algorithms seem inappropriate, particularly for caricatural generalization.
Given that it is possible to automatically extract roads (Varley and Visvalingam 1994), it should be possible to include semantic rules for recognizing truncated features such as roundabouts on cul-desacs, curves at road junctions, branch roads and so on.By offering some prospects for the clean segmentation of lines into substantive constituents, Visvalingam's approach also encourages the use of more intelligent procedures than point filtering for the recording and display of minimally simplified lines as well.For example, the chord truncation of filleted comers enables us to either represent the fillets in parametric form or to flag the segmented fillet for Nth point sampling.

Figure 3a .
Figure 3a.The effect of selecting only 12% of internal points using algorithms by (i) Douglas-Peucker (ii) Visvalingam.

Figure 3b .
Figure 3b.The effect of selecting only 8.6% of internal points using algorithms by (i) Douglas-Peucker (ii) Visvalingam.
Figure 4c.Extreme generalization of line retaining only two points with Visvalingam's algorithm.

Figure
Figure 4d.lack of scope for achieving suitable results using Douglas-Peucker algorithm.