The Future Information Infrastructure in Economics

Next: Teaching Up: The Future Information Infrastructure Previous: Other Issues of Online

Databases, Access to Data, and Indices

There is a striking difference between the amount of external reviewing received by the typical journal article, and the amount received by the data sets and programs underlying the article. Many published papers are refereed by at least two reviewers, yet the foundation of many papers--the data sets and the programs that use the data sets--are very rarely reviewed. There is little reason not to require publishing data sets--journals certainly would not publish theorems without their proofs, so why publish empirical results without their evidence? It appears that the availability and quality of data and programs used in many publications are suspect. As described by Dewald et al. (1986) for the JMCB Project, only 35% of authors asked by the editor to supply programs and data after publication did so. Of the data sets collected, only 15% were judged to be complete. Replications were attempted with the data sets from nine papers, and only two articles could be replicated exactly and another two quite closely. Anderson and Dewald (1994) found generally similar results. But, if journals require authors to place their data sets online, clearly availability would improve, and likely quality as well, as the empirical work could be reviewed publicly.

A primary reason for the lack of access to data, of course, has been the previous technological difficulty of distributing data. However, it is now possible to archive data and programs. In fact, three economics journals, the Journal of Business and Economic Statistics, the Journal of Applied Econometrics, and the Review (St. Louis Federal Reserve) strongly request or require data sets to be archived at their sites before the article is published. Anderson and Dewald (1994) reports the St. Louis Fed's experience of requesting data sets and programs ahead of time: ``Authors generally found it imposed little burden to submit data and programs with their manuscripts so long as they were aware of the requirement in advance.'' In addition, the JBES and JAE seem to have had little difficulty in obtaining data sets since the request is also made before publication.

Some argue that authors should restrict access to their data sets, but except for a very few cases involving proprietary or confidential data sets, the freest possible disclosure of information seems more appropriate. The NSF mandates public disclosure of data from studies they fund, and much of it is available online at the Inter-university Consortium for Political and Social Research. The Publication Manual of the APA tells its members to retain their data for a minimum of five years, and to make it available to all ``competent professionals'' as long as confidentiality and legal restrictions are upheld (APA, 1994, p. 283, 298). NASA (1996) makes data from its newest series of space probes, the Discovery series, available when the data is collected. One suspects that these organizations have made an implicit cost-benefit calculation of their decisions, and have come out in favor of open access to data. Finally, closer to home, the official policy of the AER is to publish only papers with data that is ``clearly and precisely documented and readily available.'' An online archive would implement this policy directly.

Online archives for data sets and programs are likely to change some professional incentives. As noted by Hare and Wyatt (1992) and countless other economists, those who generate data seldom receive much credit. Currently, the best way to receive a return from generating data is to use the data in a published article. But in the networked world, journals can archive data sets used in published articles and could even have sections solely devoted to publishing data sets. Someday perhaps a citation to an author's data will be a worthwhile addition to her vita.

Computer networks have already changed access to many types of data. For example, U.S. government agencies offer a substantial amount of data through the Internet. With the exception of the Commerce Department's Bureau of Economic Analysis, data is freely available. This access is all but mandated by OMB Circular A-130 (Office of Management and Budget, 1996), and section 3506.d of the more recent Paperwork Reduction Act of 1995. Both state and local governments in the United States, and international agencies such as the IMF and OECD that receive U.S. funds, should be encouraged to follow the U.S. example in disclosure of data. After all, as this data is first produced for policy makers, the marginal cost of putting it online is quite low.

To find publications, working papers and data sets, directories or databases are essential. Without them, one is effectively in a library without a card catalog. Several indices for information for economists on the Internet already exist: Resources for Economists on the Internet and WebEc are two general indices; BibEc is a database of hard copy working papers; WoPEc is bibliographic database of online working papers; EconWPA is an automated archive of online working papers; CodEc and Econometrics Laboratory Software Archive (ELSA) are databases of programs in economics; and Guide to Available Mathematical Software (GAMS) lists 10,000 mathematical and statistical programs. There are also many specialized databases that cover specific subfields, interests and topics (which are described in Resources for Economists on the Internet and WebEc).

While economists are understandably skeptical about monopoly providers, a single database for each type of information may well be preferable to multiple, partially overlapping or disjoint indices. After all, single databases appear to work reasonably well for the phone system, most libraries and for Internet hosts (the Domain Name System, or DNS, which lies at the heart of the Internet). We believe that just as the AEA has taken the lead in setting up JEL classification codes for indexing articles, it has a role to play in supporting and developing electronic databases for the benefit of its members.


Next: Teaching Up: The Future Information Infrastructure Previous: Other Issues of Online

Bill Goffe and Bob Parks Wed Apr 9 20:34:47 CDT 1997

Accessed times.