In the fields of business, economics, and social sciences, exploring data is paramount. Optical Character Recognition OCR and Handwritten Text Recognition HTR stands out as technologies that enables researchers to delve into scanned documents and efficiently extract valuable data.
Modern text recognition technology transforms historical and modern documents with written text, which exist only in non-editable formats, into a machine-readable form that can be effectively searched, analyzed, and modified. This enables a deeper understanding of past business trends and economic shifts, which is critical for predictive analytics and future market strategies. It also provides invaluable insight into historical social patterns and behaviors.
As such, OCR/HTR is a key technological ally in bridging the past with the future in both business and social science research. Existing and upcoming BERD OCR services will continue to help researchers unlock, explore, and share this treasure trove of data.
OCR Recommender
The OCR Recommender Service is a web application crafted to simplify your OCR projects. Forget the information overload; our platform asks targeted questions to grasp your project’s nuances, ensuring tailored guidance. From image enhancement to data extraction, we cover all modern text recognition facets. Whether you are delving into historical academia or handling sensitive business data, our platform suggests customized solutions for your unique project needs.
OCR Consultation hour
The OCR Consultation Hour is a monthly Zoom meetup with experts from Mannheim and Tübingen University Libraries. Get personalized guidance on OCR topics, from software to best practices. Perfect for troubleshooting and discovering innovative OCR solutions, it is your chance to tap into the expertise of seasoned professionals and enhance your OCR projects.
OCR on Demand & Viewer
The BERD OCR Viewer is an innovative platform for accessing and analysing online documents. Easily load scanned documents via METS file for instant full-text access. A standout feature is its ability to handle documents with inadequate or absent full text, initiating a sophisticated OCR process for new, searchable text. Predefined OCR workflows, including models for historical business and social documents, ensure output is optimised for the unique content and context of each document.
OCR Helpdesk
The OCR Helpdesk is your direct line to our experts for all your Optical Character Recognition queries. Whether you are struggling to choose a tool, have a project or are just curious about modern text recognition, our specialists offer personalized solutions based on their wealth of knowledge and experience.
Applied OCR Research
The Applied OCR Research section showcases OCR/HTR results and materials produced in the context of the BERD@NFDI project. Explore how current OCR technology is being used and developed to transform unstructured business and economic text into actionable insights.
Automatic Transcription of Handwritten Old Occitan Language
- Kooperationspartner: LMU und BAdW
- Paper: ACLAnthology
- Data: https://huggingface.co/datasets/misoda/dom_project
- Model/Code: https://github.com/EstebanGarces/OcciGen
A tailored Handwritten-Text-Recognition System for Medieval Latin
- Kooperationspartner: LMU, BAdW und UZH
- Paper: Arxiv
- Data: https://huggingface.co/datasets/misoda/MLW_data
- Model: https://huggingface.co/misoda/
- Pipeline: https://pypi.org/project/mlw-lectiomat/0.2.0/