About CiteSeerX

CiteSeerx is an evolving scientific literature digital library and search engine that has focused primarily on the literature in computer and information science. CiteSeerx aims to improve the dissemination of scientific literature and to provide improvements in functionality, usability, availability, cost, comprehensiveness, efficiency, and timeliness in the access of scientific and scholarly knowledge. Rather than creating just another digital library, CiteSeerx attempts to provide resources such as algorithms, data, metadata, services, techniques, and software that can be used to promote other digital libraries. CiteSeerx has developed new methods and algorithms to index PostScript and PDF research articles on the Web. Citeseerx provides the following features.



CiteSeer was the first digital library and search engine to provide automated citation indexing and citation linking by autonomous citation indexing.

CiteSeer was developed in 1997 at the NEC Research Institute, Princeton, New Jersey, by Steve Lawrence, Lee Giles and Kurt Bollacker. The service transitioned to the Pennsylvania State University's College of Information Sciences and Technology in 2003. Since then, the project has been led by Professor Lee Giles.

After serving as a public search engine for nearly ten years, CiteSeer, originally intended as a prototype only, began to scale beyond the capabilities of its original architecture. Since its inception, the original CiteSeer grew to index over 750,000 documents and served over 1.5 million requests daily, pushing the limits of the system's capabilities. Based on an analysis of problems encountered by the original system and the needs of the research community, a new architecture and data model was developed for the "Next Generation CiteSeer," or CiteSeerx, in order to continue the CiteSeer legacy into the foreseeable future.


  • We gratefully acknowledge current and past support from:
    • The National Science Foundation award CNS-0958143.
    • Microsoft Research
    • NASA
    • Qatar
  • The initial header parsing algorithm used by CiteSeerx was developed by Hui Han, C. Lee Giles, Eren Manavoglu, Hongyuan Zha, Zhenyue Zhang, and Edward A. Fox. The algorithm was further refined by Levent Bolelli and Isaac Councill.
  • Yang Song developed an initial MyCiteSeer prototype that guided later efforts.
  • Yang Sun contributed the venue analysis code for calculating impact factor statistics.

Open Source Acknowledgements

CiteSeerx is supported by numerous excellent open source applications and libraries. Specifically, we would like to thank all who participated in the development of the following projects:

We Also Recognize

  • Andrew Ng was the first to extract title and author information from the header of PostScript files.
  • The New Zealand Digital Library was the first to index the full text of PostScript research articles.
  • Dr. Eugene Garfield created the idea of citation indexing of the scientific literature.

Special Thanks

Many have contributed to CiteSeer and its continuing development. In a list in which some are surely missing, we would like to thank

  • Anurag Acharya
  • Joshua Alspector
  • Esam Alwagait
  • Jose Nelson Amaral
  • Anders Ardo
  • Bill Arms
  • Shumeet Baluja
  • Arunava Banerjee
  • Eric Baum
  • Donna Bergmark
  • Levent Bolelli
  • Kurt Bollacker
  • Shannon Bradshaw
  • Vivek Bhatnagar
  • Jay Budzik
  • Robert Cameron
  • Jack Carroll
  • Rich Caruana
  • Ingemar Cox
  • Sandip Debnath
  • Seyda Ertekin
  • Scott Fahlman
  • Umer Farooq
  • Gary Flake
  • Ed Fox
  • Eugene Garfield
  • Susan Gauch
  • Bill Gear
  • Paul Ginsparg
  • Eric Glover
  • Abby Goodrum
  • Marco Gori
  • Allan Gottlieb
  • Jim Gray
  • Hui Han
  • Mike Halm
  • Steve Hanson
  • Stevan Harnad
  • Eric Hellman
  • Hui Han
  • Geoff Hinton
  • Haym Hirsh
  • Steve Hitchcock
  • Jian Huang
  • Kirby Huntsinger
  • Gerd Hoff
  • Ernesto Di Iorio
  • Jim Jansen
  • Shannon Johnson
  • Paul Kantor
  • Madian Khabsa
  • Jon Kleinberg
  • Thomas Krichel
  • Bob Krovetz
  • Carl Lagoze
  • Andrea LaPaugh
  • Steve Lawrence
  • Wang-Chien Lee
  • Jay Lepreau
  • Michael Lesk
  • Huajing Li
  • Marco Maggini
  • Eren Manavoglu
  • Andrew McCallum
  • Chris Milito
  • Steve Minton
  • Tom Mitchell
  • Finn Nielsen
  • Michael Nelson
  • Craig Nevill-Manning
  • Andrew Ng
  • Andrew Odlyzko
  • Frank Olken
  • David Pennock
  • Yves Petinot
  • Brian Pinkerton
  • Alexandrin Popescul
  • Augusto Pucci
  • Betsy Richmond
  • Ben Schafer
  • Bruce Schatz
  • Terrence Sejnowski
  • Anand Sivasubramaniam
  • Warren Smith
  • Yang Song
  • Amanda Spink
  • Yang Sun
  • Harold Stone
  • Pucktada Treeratpituk
  • Kostas Tsioutsiouliklis
  • Valerie Tucci
  • Lyle Ungar
  • Frits Vaandrager
  • Moshe Vardi
  • David Waltz
  • James Ze Wang
  • Simeon Warner
  • Ian Witten
  • John Yen
  • Maria Zemankova
  • Hongyuag Zha
  • Ding Zhou
  • Ziming Zhuang


We are very thankful for the generous support that our sponsors have provided. In particular, CiteSeerx would not exist without their support.
If there is any interest in sponsoring CiteSeerx, please contact Professor Giles.