Asking Good Questions
And the companies are not wrong about genuinely impressive improvements. Six years ago in this newsletter, I wrote about some initial testing I had been doing with computer vision APIs from Google and Microsoft, a first attempt to analyze the photo morgue my library had recently acquired from the Boston Globe. There were glimmers of hope that these pre-GPT tools could help us identify topics in millions of photographs that lacked rigorous metadata, and I found even 80% accuracy to be promising. Now our library’s digital team, much more capable than I am, has created an abstracted interface to all of the main multimodal AI services and is testing the ability of these services to provide subject headings and descriptions, with much better results (although all of the services are still imperfect).
Fellow historian Benjamin Breen has documented similar advances in his testing of AI. The latest models are scarily on par with a first-year doctoral student in history in some areas, able to provide solid context and advanced interpretations of documents and images, even complex ones that require substantial background in a field. The frontier models are much better than most doctoral students in other tasks, such as translation and transcription. Handwriting recognition for historical documents, in particular, has been among the hardest problems for computer scientists to solve, and cracking it will have a significant impact on historical research. Historian Cameron Blevins has shown that custom GPTs are now on a path to a solution that could make archives and special collections much more searchable and readable in ways that might transform our ability to do history. What these other tests of artificial intelligence show is that significant AI progress may lie not in some kind of examination endgame, of perfect answers to tough questions, but in the important, but often hidden, middle stages of a research project, when evidence is being assembled and interpreted.