138 Years of Popular Science

Lost in the image records are the steps that involved the data – and there were a lot of them. The archive was text that came from an OCR (optical character recognition) process, and was incredibly messy. To make matters worse, the file names for each issue were machine-generated and didn’t tie to the actual date order of the documents. A great deal of our time was spent cleaning up this data, and compiling customized datasets (many of which never ended up getting used).

138 Years of Popular Science

Lost in the image records are the steps that involved the data – and there were a lot of them. The archive was text that came from an OCR (optical character recognition) process, and was incredibly messy. To make matters worse, the file names for each issue were machine-generated and didn’t tie to the actual date order of the documents. A great deal of our time was spent cleaning up this data, and compiling customized datasets (many of which never ended up getting used).

Posted 10 months ago
93 notes
  1. idrewyouasquirrelbecauseiloveyou reblogged this from sunfoundation
  2. con-volution reblogged this from ikenbot
  3. tenatuntitled reblogged this from sunfoundation
  4. heeyunkim reblogged this from scinerds
  5. bacheloretteofscience reblogged this from sunfoundation
  6. enterimaginationland reblogged this from sunfoundation
  7. listot reblogged this from ikenbot
  8. neuronsandneutrons reblogged this from ikenbot
  9. mjano1979 reblogged this from sunfoundation
  10. soullesskisser reblogged this from sunfoundation
  11. mondevivant reblogged this from sunfoundation
  12. zimnovoi reblogged this from scinerds
  13. umarface reblogged this from ikenbot
  14. mad-variables reblogged this from sunfoundation
  15. do-nothing reblogged this from sunfoundation
  16. stardustandwands reblogged this from scinerds
  17. wildlydistorted reblogged this from project-argus
  18. heabuh reblogged this from scinerds
  19. project-argus reblogged this from adefectiveidealist
  20. faitimuti1980 reblogged this from sunfoundation