Other Projects other
Posted: May 20, 2022
This is a Chrome extension that detects the presence of reCAPTCHAs on a Web page and invites the user to record and preserve their interaction. By detecting specified HTML elements within a Web page, the extension enables researchers to preserve users’ interactions with an interface without needing to continuously (and invasively) record their browsing. The extension aims to balance the priorities of Web preservation with user privacy and autonomy. This represents a new approach to Web preservation that may be useful to other digital humanities projects by attending to ephemeral user interactions that other preservation tools are not as well-suited for. Read More... other
Posted: January 1, 2022
Do you have a professor who provides PDFs that look like they’ve been run through a meat grinder? Are you a professor who provides PDFs that contain multiple scanned pages on a single page? We’ve all been there – someone has generously provided us with a PDF copy of a book chapter or article. But there’s one problem – none of the text is actually searchable. And each page of the PDF actually contains two pages of scanned text. PDFs in this format are not accessible to people using screen readers, and keeping multiple scanned pages on each PDF page can make navigating the document inconvenient and cumbersome. Adobe Acrobat can make those PDFs more usable, but manually fixing your scans each time is tedious. This Acrobat Action will semi-automatically process a scanned PDF to separate each scanned page onto its own PDF page and run the entire document through OCR to create searchable and selectable text.Read More... other
Posted: January 1, 2020
Posted: January 1, 2019
This simple Python script uses 4chan's read-only APIs to scrape the information from the front page of a given imageboard. In addition to saving every image posted to the board, the script will also generate multiple CSV files that record which threads were on the front page at a given time. A folder is generated for each thread's images, as well as an individual CSV file that records each reply in the thread as well. I have done some research on anonymous online communities, the ways they communicate with one another, and how they're able to influence real events in the physical world. Rather than manually browsing and downloading content from 4chan imageboards, I built this script to automatically scrape the most recent content from a given 4chan imageboard.Read More...