Apparently, there's some kind of football game going on here in the US this weekend. Strangely though, the ball isn't round. The playing field isn't even oval. No, this is American Football.
Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.
History is littered with hundreds of conflicts over the future of a community, group, location or business that were "resolved" when one of the parties stepped ahead and destroyed what was there. With the original point of contention destroyed, the debates would fall to the wayside. Archive Team believes that by duplicated condemned data, the conversation and debate can continue, as well as the richness and insight gained by keeping the materials. Our projects have ranged in size from a single volunteer downloading the data to a small-but-critical site, to over 100 volunteers stepping forward to acquire terabytes of user-created data to save for future generations.
The main site for Archive Team is at archiveteam.org and contains up to the date information on various projects, manifestos, plans and walkthroughs.
This collection contains the output of many Archive Team projects, both ongoing and completed. Thanks to the generous providing of disk space by the Internet Archive, multi-terabyte datasets can be made available, as well as in use by the Wayback Machine, providing a path back to lost websites and work.
Our collection has grown to the point of having sub-collections for the type of data we acquire. If you are seeking to browse the contents of these collections, the Wayback Machine is the best first stop. Otherwise, you are free to dig into the stacks to see what you may find.
The Archive Team Panic Downloads are full pulldowns of currently extant websites, meant to serve as emergency backups for needed sites that are in danger of closing, or which will be missed dearly if suddenly lost due to hard drive crashes or server failures.

« December 2008 | Main | February 2009 »
Apparently, there's some kind of football game going on here in the US this weekend. Strangely though, the ball isn't round. The playing field isn't even oval. No, this is American Football.
Posted by David Smith at 10:46 in sports, statistics | Permalink | Comments (0) | TrackBack (0)
png(file="mygraphic.png",width=400,height=350)
plot(x=rnorm(10),y=rnorm(10),main="example")
dev.off()
png(file="animals72.png",width=400,height=350,res=72)
plot(Animals, log="xy", type="n", main="Animal brain/body size")
text(Animals, lab=row.names(Animals))
dev.off()
R is assuming the graph area is 5.55 inches across, so the default text size is large relative to the graph itself. You can correct this with the res= argument to png, which specifies the number of pixels per inch. The smaller this number, the larger the plot area in inches, and the smaller the text relative to the graph itself. Let's see what happens when you drop this down to 45/inch:
png(file="animals45.png",width=400,height=350,res=45)
plot(Animals, log="xy", type="n", main="Animal brain/body size")
text(Animals, lab=row.names(Animals))
dev.off()
Note the title is smaller, and the text labels are smaller too, making for a less-crowded plot. I like to choose a resolution that gives me an X dimension in the 8-10 inches range (here 400/45 = 8.33 inches).
png(file="notitle.png",width=400, height=350)
par(mar=c(5,3,2,2)+0.1)
hist(rnorm(100),ylab=NULL,main=NULL)
dev.off()
In this version, the text is much easier to read and the lines appear smoother.
If you don't have anti-aliasing on your system (and can't recompile R to enable it), you can use the poor-man's anti-aliasing trick: generate the graph in double the resolution, and display it at half the size. The browser will handle the anti-aliasing, at the expense of additional bandwidth for your graphic.
Of course, the most important tip for making your graph look good is: make a good-looking graph! Graphical display of quantitative data is in some ways more art than science, but as a general rule it takes time and effort to make a truly effective display that lets your data tell the story it needs to tell. Fortunately, R provides you with all the tools you need to pull out all the details, make the right comparisons, and make the results pleasing to the eye. Don't be satisfied with the "stock" graphs from the top-level functions like plot or hist. Make liberal use of the annotation functions like text and line, and experiment with choices of color, layout, and size.
There are many good resources for learning about making good graphical displays, but my favorite is Tufte's classic: The Visual Display of Quantitative Information. Not only is it chock-full with wonderful examples and sensible guidelines for displaying data, it makes a beautiful coffee-table book to show your non-statistician friends that Statistics is about more than just numbers.
If you want to download the scripts that generated the graphs in this article, you can get them here:
Download graphexamples.R (1.4K)
Posted by David Smith at 16:27 in advanced tips, graphics, R | Permalink | Comments (15) | TrackBack (0)
One of the most unique and powerful aspects of R is its ability to create statistical graphics beyond the limited palette found in off-the-shelf graphing tools like Excel. Especially for novices of data presentation, it can be difficult to grasp how much more meaning can be extracted from data when you have the tools to combine science and art creatively to create unique visualizations. (As an aside, I've been pleased to see that this is an idea that has been coming into the mainstream recently: the New York Times, for example, has in recent years has had some truly outstanding displays of data, both static and interactive. There was a fascinating article about the people behind those graphics in the New York magazine last week.)
Posted by David Smith at 09:59 in advanced tips, graphics, R | Permalink | Comments (3) | TrackBack (0)
REvolution R, the high-performance distribution of R from REvolution Computing, is now available for download for Windows and MacOS X systems from the REvolution Computing website. (The software has actually been available for a little while, but has only been formally announced in a press release today.)
Posted by David Smith at 14:51 in announcements, Revolution | Permalink | Comments (0) | TrackBack (0)
Andrew Abela: Choosing a good chart.
Posted by David Smith at 14:50 in graphics | Permalink | Comments (0) | TrackBack (0)
The Bay Area UseR Group will be meeting in San Francisco on Wednesday, February 18 at 7:30PM. The featured event will be a panel discussion: "pRediction: A quick survey of prediction methods in R". The panel members will include:
Posted by David Smith at 13:31 in events | Permalink | Comments (0) | TrackBack (0)
R is fast becoming a powerful tool for high-performance computing: the art making computational problems that take a long time to process run faster through the use of multiprocessor computers or computer clusters.
Posted by David Smith at 12:50 in advanced tips, high-performance computing, R | Permalink | Comments (0) | TrackBack (0)
In what has become an ongoing series of R tutorials, here's another Introduction to R document, by James Monogan. If you're familiar with interactive programming (but not R), but don't have a lot of time, this might be the introduction for you: it takes you through the basics of R at an efficient clip. In its 26 pages you'll learn how to:
Posted by David Smith at 08:13 in beginner tips, R | Permalink | Comments (0) | TrackBack (0)
Michael Friendly asks an interesting question on the r-help list: how can you generate a title where the words are in different colors, like this:
Hair color and Eye color
(Michael suggests a title like this might serve as an implicit legend for the point plotted in the graph below the title.)
The title function allows you to change the color of the text using the col argument, but that color is applied to the entire text string -- there's no obvious way to set the color of individual words.
Or is there? Barry Rowlingson offers an elegant solution that uses the "overhead transparency" principle of R graphics: you can overlay additional graphical elements one atop another, to build up your graph layer by layer. So you could add the title Hair color in red on the left, and Eye color in blue on the right, and put a black "and" in the middle. The trick is in the positioning -- it could take a lot of trial and error to get the x position of each element correct. But if you plot the same text three times in three different colors, but leave some words blank (so they won't overlay previously plotted elements) you don't have to worry about positioning at all. The phantom notation allows you to do that, as shown in Barry's solution:
plot(rnorm(20),rnorm(20),col=rep(c("red","blue"),c(10,10)))
title(expression("Hair color" *
phantom(" and Eye color")),col.main="red")
title(expression(phantom("Hair color and ") *
"Eye color"),col.main="blue")
title(expression(phantom("Hair color ") *
"and " * phantom("Eye color"),col.main="black"))
The phantom notation means "leave room for this, but don't draw it" -- see help(plotmath) for other examples. Barry also provides a function multiTitle to create multicolor titles in a single command:
multiTitle(color="red","Hair color", color="black",
" and ",color="blue","Eye color")
Another solution (suggested by Duncan Murdoch) is to use the strwidth function to calculate the widths of words and use this information to set the x position of individual words, as demonstrated in his technicolorTitle function. However, as this solution is implemented using the mtext function the results can be slightly different to what title usually produces.
You can download the code to create the graph above, and for the multiTitle and technicolorTitle functions here: Download colortitles.R (2.3K)
Posted by David Smith at 14:36 in advanced tips, R | Permalink | Comments (6) | TrackBack (0)
For newcomers to R who have at least a basic background in the principles of statistical analysis, John Maindonald has contributed an introductory guide to R: Using R for Data Analysis and Graphics. It uses a series of data sets and example R code to take the beginning user through launching R (on a Windows system; installing R is not covered), executing simple commands at the command-line, understanding objects and R function calls, graphics, and some simple statistical modeling techniques. Later chapters do touch on some more advanced modeling methods and how to program your own functions, but these sections can safely be ignored by the beginning R user.
Posted by David Smith at 16:01 in beginner tips, R | Permalink | Comments (1) | TrackBack (0)
Follow David on Twitter: @revodavid
Get our FREE eBook "10 Programming Tips That Changed Everything" when you subscribe!
No spam. Unsubscribe anytime.