The difference between the amount of external reviewing an article receives, and the data and programs underlying many articles, is striking. Many, if not most, published papers are refereed by at least two reviewers, and a substantial number are reviewed after resubmission. Next, an accepted paper is carefully copy-edited. Yet, the foundation of many papers--the data and the programs that use the data--are very rarely reviewed. In an ideal world, referees should review them as well--just as in theory papers the proofs are reviewed. A step in this direction now exists--the data and programs used in an article can be archived on-line. Three economics journals, the Journal of Business and Economic Statistics , the Journal of Applied Econometrics, and the Review of the Federal Reserve Bank of St. Louis request, if not require, data to be archived at their sites before the article is published.
Two studies on the availability and quality of data and programs are quite instructive. In the JMCB Project, (Dewald, Thursby, and Anderson (1986)), authors of papers accepted or submitted to the JMCB were asked to supply the data and programs used in their papers. Of those who were asked after the paper was published, only 35% did so (32% never even responded to the letter from the editor). However, 72% of those whose paper was still under review and 78% of those whose paper was accepted, but not yet published, were able to supply the data and programs. Unfortunately, of the datasets actually collected, only 15% were judged to be complete. Finally, replications were attempted with the data from nine papers. Only two articles could be replicated exactly and another two quite closely.
A follow-up study Anderson and Dewald (1994) of the Federal Reserve Bank of St.
Louis found generally similar results for papers given at one of
their conferences--if data was requested after the fact, authors had
trouble providing it. But, if the data was requested ahead of time,
``Authors generally found it imposed little burden to submit data and
programs with their manuscripts so long as they were aware of
the requirement in advance, although the Bank's staff had to make
some follow-up calls to clarify documentation.''
Thus, it appears that a researcher interested in obtaining someone else's data and programs will experience the same problems. This suggests that there may be substantial problems in our literature. But, at very little cost to the journals (running an archive site is relatively simple and low cost, particularly if it is centralized by an organization such as the AEA), and little cost to the authors (if notified ahead of time), these problems can be ameliorated with an on-line archive. In short, the new technology of networks offers a real advance to the profession at low cost.
Some argue that authors should restrict access to their
data,
but there are several
arguments against this. A primary goal of academia is the freest
possible exchange of information. For instance, the NSF mandates
public disclosure of data from studies they fund (much of it is
available on-line at the
Inter-university Consortium for Political and Social Research
. The
Publication Manual of the American Psychological Association
tells its members to retain their data for a minimum of five years,
and to make it available to all ``competent professionals'' as long
as confidentiality and legal restrictions are upheld (American Psychological Association (1994),
p. 283 & 298). Finally, NASA makes data from its newest series
of space probes, the Discovery series, available when the data is
collected (NASA (1996)). Surely the investigators who build and run
these probes, a process that takes years, have more at stake than
almost any economist in the data.
The economics of this issue are interesting. First, as noted by
Hare and Wyatt (1992) and observed by countless economists, those that generate
data seldom receive much credit.
This is partially due to
the previous technological infeasibility of distributing data, which
is no longer true. Currently the best way to receive a return from
generating data is to use the data in publications. In the electronic
world, journals cannot only archive data used in published articles,
but can also have sections solely devoted to publishing data. Releasing
the data in an article, in the current world, can reduce the incentive
to generate it--an author may not have time to produce all the papers
for which the data is a base before others write those papers. On the
other hand, preparing data for release likely improves its quality
and a citation to the data should be equally or more valuable than a
citation to the article. Further, the organizations described above
(NSF, APA, NASA, JAE, JBES) apparently have decided that reducing
this incentive is worth the benefits it provides for generating more
knowledge. Finally, given that ``publication'' of the data is possible
in an electronic world, there is little reason not to require it--we
certainly would not publish theorems without their proofs, so why
publish empirical results without their proof?
Computer networks may change access to other sorts of data. Already, U.S. government agencies offer a very substantial amount of data over the Internet. With the exception of the Commerce Department's Bureau of Economic Analysis, data is freely available. This access is all but mandated by OMB Circular A-130, (OMB) and section 3506.d of the more recent Paperwork Reduction Act of 1995 (PRA) generally mandates public access to government data through ``a diversity of public and private sources'' at low cost. Interestingly, the resale of U.S. data is generally permitted. While the profession can hardly mandate that other governments follow the lead of the U.S. government, they should be encouraged--after all, their taxpayers paid for its collection. International agencies, such as the IMF and OECD, that receive U.S. funds should be encouraged to follow the U.S. lead as well.
To find publications, working papers and data, directories or
databases are essential. Without them, one is effectively in a
library without a card catalog. Already there are several indices
for information for economists on the Internet: ``Resources
for Economists on the Internet'' and ``WebEc''
are two of several general indices. ``BibEc'' is a
database of ``hard copy'' working papers, while
``WoPEc'', and
``EconWPA''
are databases of electronic working papers. Finally,
``CodEc'' ,
``ELSA'' are databases of programs in economics. and (Guide
to Available Mathematical Software) at ``GAMS''
Finally, there are also many specialized
databases that cover specific subfields, interests and topics
(which are described in ``Resources for Economists on the Internet''
and WebEc).
While economists are skeptical about monopoly providers, a single
database for each type of information may well be preferable to
multiple, partially or non-overlapping indices. For instance, if each
publisher had a separate database, only accessible to subscribers,
and the information was not available elsewhere, networks would
not provide much additional benefit to the profession in this
area. Single databases appear to work reasonably well for the phone
system, libraries,
and for Internet
hosts (the Domain Name System, or DNS, which lies at the heart of
the Internet). The trick, of course, is for the provider to face the
correct incentives to operate in the best interests of the profession.
Just as the AEA provides the Journal of Economic Literature to
its members, it also needs to support and develop electronic databases
for the benefit of its members.