Using a Geolocation Social Networking Application to Calculate the Population Density of Sex-Seeking Gay Men for Research and Prevention Services


Figure 4. Map of Atlanta showing 79 data collection points from profiles on a sex-seeking networking app; radii of yellow circles represent distance to user sample at the maximum distance from the sample point, and overlapping circles completely cover Atlanta, with smaller circular areas used for data collection where there were the largest numbers of application users.

View this figure


The data in this study provide a somewhat unique challenge to geospatial statistical methods because they combine the characteristics of point and area processes [3237]. Data are collected at points on a grid, but the data at that point represent a density over an area of sampling in a concentric circle around that point. Still, the data are more analogous to point data, with the measure collected at each point representing an area rather than an individual data point. Thus we chose to treat these densities of users per square mile as the measure of interest but use point data statistics [32,38,39] to summarize over the entire study area. ArcGIS [39] performs kernel smoothing to estimate the density measured at each sample point where each sample point is weighted by the observed population density at that point. In our case, the Kernel Density smoother [34] counts every white and black user observed at that location. For example, a point at which we observed 12 profiles within 2 miles, including 8 white and 4 black users, would be counted 8 times in the white density measure and 4 times in the black density measure. Next, these weighted values for each point are also averaged with other points within a specified radius [32,36,37], resulting in a smoothed surface representing the density of users, by race, in the sample space. The kernel approach may place non-zero density in areas where no data were collected, but only as a result of averaging between points separated by the area with no data. We also experimented with methods for interpolation of spatial data such as kriging [32,38] and found similar results. We focus on kernel density estimates here. As noted above, sampling was conducted at different times and days of the week over a 6-month period (see Multimedia Appendix 1 for documentation of days and times sampled). While an in-depth analysis of time of day and day of week variability is of interest for future research, to illustrate our approach, we present the kernel densities calculated here as averages over sampled days and times.

After estimating the population density, we used ArcGIS to compute the mean and standard deviation for the calculated density measure over the entire sample space. We compared density surfaces through ratio and difference measures via the Map Algebra tool in ArcGIS, which solves standard algebraic equations at each point in a grid across the density surface and creates a new map displaying the results of these calculations. When comparing the density of users, the difference between surfaces for different races, for example, (density of black users – density of white users) has the property that its null value (no difference) is zero, and if positive, it identifies an area with a higher density of black users than white users. This represents an absolute difference in the densities of the two groups. When positive, this approach identifies areas where it might be easier to recruit black users because the density of black users is greater in absolute terms (ie, the number of excess individuals). We note that this example says nothing about the magnitude (size of the density of black and/or white users), only that one number is bigger than the other. To capture areas where there are relatively more black users than white users (ie, the ratio of black to white users is higher), we also calculated the ratio of the two density surfaces.

As a further exploration of the possibilities with the approach, we also considered a measure to highlight areas with the largest densities for each race and then compare these areas as follows. First, for each density surface (eg, the density of black users 25 years of age) we identified areas with the highest density values (density value mean + 2 SD). For example, if the estimated mean density for white users was 14/square mile with standard deviation of 7, we would ask ArcGIS to select points with a density of white users greater than 28. We then used Map Algebra to calculate the difference between the surfaces including these highest density points for each race according to the following formula:

I(Density of black users mean + 2SD of estimated kernel density distribution) − I(Density of white users mean + 2SD of estimated kernel density distribution)

where I(statement) represents an indicator function with value 1 if the statement is true and zero otherwise. This equation takes only three values: zero when a point is greater than mean + 2SD of both distributions or neither is greater than mean + 2SD; 1 when a point is greater than the mean + 2SD for only the first distribution; and -1 when the point is only greater than the mean + 2SD of the second distribution. This measure identifies not only locations with more users of a given race, but also locations with the highest density areas overall. Similar measures can be constructed to highlight other features of interest, for example, comparing densities by age group or combinations of race and age. Finally, to provide some context to our results, we present them in relation to the location of recruitment sites seeking to enroll MSM for two ongoing HIV prevention studies in Atlanta.


Over a 2-week period, we spent a total of 21 hours traversing Atlanta, collecting data at the 79 sample points (Figure 2) covering 883 square miles of area (Figure 4) in order to collect overlapping circles of data and cover the entire 132.4 square miles in the city of Atlanta. The average radius of data collection at each sample point was 1.65 miles, with smaller radii resulting from the more densely populated areas.

We extracted profile data (race and age) for 2666 user profiles. Of these, 1563 (58.63%) were white, 810 (30.38%) were black, 146 (5.48%) were some other race, and 147 (5.51%) did not report a race in their profile. The mean age was 31.5 years, with 591 (22.17%) between the ages of 18-25, and 496 (18.60%) between the ages of 26-30. Age was more likely than race to be missing from profile information with 593 (22.24%) of profiles sampled not providing age information. The remaining 37% of profiles reported ages greater than 30; whites were more likely to report being 30 years of age than blacks (46% vs 25%, P.001). Black users were younger than white users (median 28 vs 33 years, P.001 via the Wilcoxon Sign rank test).

Across the 79 sampled points, the mean number of users was 33 per square mile, but the distribution of users across points was highly skewed with median of 17 and range 0.86-208 (Figure 5).

Figure 6 shows the density of app users, smoothed using a kernel density function with a 2-mile radius, for white (A) and black (B) users. A 2-mile radius was chosen as the smoothing parameter because it was the next largest integer that covered the average radius of 1.6 miles in the sampled points and also was the maximum distance to which we sampled data when a sample point had fewer than 50 users. Multimedia Appendix 1 shows the analogs of Figures 6 and 7 with a 1-mile kernel density smoothing parameter for comparison; the results were not qualitatively different. The highest density of white users (the darkest blues in the first panel in Figure 6) concentrates in Midtown Atlanta (roughly bounded by the yellow rectangle on the map). While much of the highest density of black users also concentrates in this area, it is clear that there are areas with high densities of black users further south and to the west (to the lower left) of Midtown. The kernel approach smooths observations according to a two-dimensional distribution centered at the observed point and declining out to the radius used to define the search area, essentially “spreading” observations from sample points across the study area. For example, the density values for white users over the 79 sample points ranged from 0.3 to 154 profiles per square mile, but the range of values for the smoothed density shown in the first panel in Figure 6 was 0-57 profiles per square mile. For the 1-mile smoothed density (Multimedia Appendix 1) the range (0-138) was closer to the observed values, but with many more points with density estimates of zero (ie, observations were not “spread” as far).

There are several ways to compare surfaces to illustrate local differences between the densities of white and black users. Figure 7 shows two similar but nonidentical ways to compare these densities. Panel A in Figure 7 shows the difference between the two surfaces, colored so that areas with higher absolute density of white users are blue and areas with higher density of black users are red. Panel B in Figure 7 shows the relative difference, with areas where the ratio of black to white profile densities is higher than one as red and lower than one as blue.

The ratio measure shows that most of Southwest Atlanta has relatively more black user profiles observed than white profiles, but when we compare the map with that of the overall number of black users, we find a much smaller region in which to focus efforts, that is, south and west of Midtown, shown with a yellow band in Figure 7.

A third way to visualize differences between the surfaces is to focus on the areas with extreme values. This provides a within-density comparison: over the entire surface of the density of black user profiles, where is the density the greatest? In Figure 8, we highlight the regions with density greater than the mean+2 standard deviations over the entire map, separately for all white (A), all black (B), and young black (25 years old, C) users based on data in their observed profiles. This approach again highlights Midtown Atlanta (yellow rectangle) as the region with the most users observed in each graph.

Figure 9 calculates the difference between the first two panels in Figure 8 and shows that black user profiles have high density much further south than white profiles.

The third figure included in Multimedia Appendix 1 compares the difference
between the 1-mile smoothed densities for young black and all black users (an analog to Figure 9 but comparing panels B and C of Figure 8). Overall the results are similar, but there are a few additional areas (highlighted in Multimedia Appendix 1 figure) with extreme densities of young black users that did not appear in the 2-mile estimates shown in Figure 8c or 9).

Comments Off on Using a Geolocation Social Networking Application to Calculate the Population Density of Sex-Seeking Gay Men for Research and Prevention Services

Tags: ,