Where & when
- Aarhus 19. og 20. November. Both days from 9 a.m. to 4 p.m. Victor Albecks Vej 1, room: M4 and M5
- Copenhagen 3. og 4. December. Both days from 9 a.m. to 4 p.m. The Black Diamond, room: Panorama Hall
Below are listed the websites from where the software can be downloaded:
If you run into any kind of problems while downloading and installing the programs, please contact one of the organizers, Max, Martin, or Lars.
Notebooks with code can be found at the Datasprint Github page.
The sources selected for this datasprint are digitized newspapers historical newspapers from 1830 to 1870. The digitization is made by scanning the original pages, or by scanning microforms, and then applying OCR technology in order to recognize the text on the pages.
The OCR technology is a difficult technology and the result is far from a precise reproduction of the original text. For example: “ri” is often recognized as “n”, “c” as an “e”, and “i” as a “l”. These and similar errors can distort the final result.
The newspapers from 1830 to 1870 are now longer under any copyright protection law. This means that there is no restrictions and possibilities for experimentations – example experiment with text mining algorithms on the texts.
Below there are links that lead to a csv file containing newspaper texts from the period
1830 to 1870: