Sunday, March 11, 2007

Restricting Search Results to a Date Range

Sometimes you want to find a web page written in a certain interval, but Google is not very helpful and it doesn't let you restrict a search to pages written, let's say, between May 2001 and March 2002. A good reason is that Google only records the date when the crawler visited a page last time, so it's not an easy task to find the date when a page was published.

But there are some areas where Google and other search engines allow you to specify an interval for your search:

1. Blog Search. Blogs are a recent phenomenon, and the oldest blog posts indexed by Google Blog Search date back to October 1999.

Restrict your search to blogs if you want to see what ordinary people think about an event, a product or a web page. After you perform a search, click on "Choose dates" in the sidebar.

2. News Archive is an index of historical content, including including major newspapers, magazines, news archives.

To restrict the date, go to advanced search and enter an interval. Note that most of the content from Google News Archives requires subscriptions. A good way to get free access to some of the archives is to install Congoo toolbar (Windows only).

You can also find good lists of news archives, like this one.

3. Google Books, Google Scholar and Google Patents are specialized search engines for books, scholar papers and patents. The advanced search options offer different ways to choose a temporal interval.

4. Google Groups indexes Usenet archives and is an excellent place to search for old discussions that date back to 1981.

You can find, for example, the first mention of AIDS:
"Last I heard, the cause or means of transmission of AIDS was not known. I would appreciate any pointers to well-documented claims to the contrary."

"The disease sounds very frightening. I had heard about it about two weeks ago. Seems like the public should be more aware of it."

5. Google offers some options regarding the date when a page was updated, but they are pretty fuzzy: you can restrict the results to pages updated in the past 3, 6, or 12 months.

There's also a daterange operator, with the syntax daterange:startdate-enddate. Unfortunately, the dates must be entered as Julian dates (the Julian date is calculated as the number of days since January 1, 4713 BC). Based on a simple conversion algorithm, you can create a form that lets you restrict a search to a date range:



6. Alltheweb, now owned by Yahoo, has a similar option in the advanced search and it works pretty well.

7. Some queries that may help you find pages published in a certain year. You can add these to your query.

"last updated * * 2003"
"last modified * * 2003"
inurl:2003
"march * 2003"

8. Once you found a page, you might be interested to see how it looked before. The Wayback Machine has been capturing pages since 1996, but it doesn't have a way to search through those copies.

9. Google Desktop and Search History build personal archives ordered by the date you entered a query or visited a web page. Google Desktop even caches each new version of a web page, so it builds your own Wayback Machine.

But, as you can see, except for some specialized search engines, Google doesn't have memory and it's impossible to search the Internet from 2002, or to determine the precise date when a document was written.

As this article from First Monday concludes, "search engines are unreliable tools for data collection for research that aims to reconstruct the historical record or for research that aims to analyze the structure of information at a particular moment in history."

No comments:

Post a Comment