Accessing academic article OAI repositories

In investigating academic stuff, I’d previously wondered why repositories of academic papers didn’t do more with their searching capacities and specifically in visualising the connections between papers. Well a bit of digging has kind of clarified things. Specifically Citeseerx (which is a site that has big lists of papers, their references and citations) uses the OAI standard to make it’s data accessible. This standard isn’t really designed for searching, but for the exchange of the metadata sets. The verbs that can be used to access the data won’t for instance allow the location of a paper with a specific title. It will allow the harvesting of the metadata between two dates. So the process looks like:- 1) find out the metadata schemas available (go on, try this at home!) :- http://citeseerx.ist.psu.edu/oai2?verb=ListMetadataFormats we get back some XML telling us that offer dublin core “oai_dc” 2) ask for metatdata within certain criteria (only dates which is the limiting part):- http://citeseerx.ist.psu.edu/oai2?verb=ListIdentifiers&metadataPrefix=oai_dc&from=2010-01-01 we get back XML giving us records with thier timestamp and identifier. 3) ask for the detailed record using that identifier:- http://citeseerx.ist.psu.edu/oai2?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:CiteSeerXPSU:10.1.1.1.1952 Now we have something that includes the papers title and details. So if we wanted to use this to locate the details of a paper by say it’s title we basicly have to download a copy of of the repository so we can conduct a search as we wish. This isn’t a failing of OAI, it’s designed for swapping metadata. If you want to search and do something wizzy with the data that’s your problem! Effectively sites like Citeseer are adding extra capacities over the raw data, though it does feel very basic and clunky. So although it’s tempting to download the metadata, add add my own search capacities and visualisations (something that has been done) I’ll not get distracted, tempting as it might be!