I requested to download all the info Flickr has about me on Tuesday, got the data on Wednesday, and yesterday I spent a couple of hours trying to figure out what I got.
- There were 14 zip files, each containing one folder.
- Total of 6.59GB. 12,729 files
- 13 of the folders contain JPG images.
- One folder, named 72157704082859561_da410b4a23fb_part1, contained only JSON files, 36.8MB, 6366 files. The vast majority of the JSON files have names like photo_6594805.json. I've uploaded one of the files to the Scripting News repo, so you can see what kind of data you get about each image.
- Each of the 13 folders contains files with names like dsc02455jpg_76330356_o.jpg. The dates on the files unfortunately are all the same, they don't say when the photos were taken. This would be helpful in piecing things together. It appears the first part of the name, before the underscore, is derived from the name of the file. Then following the first underscore and before the second is a number, which appears to be the identifier of the picture. This I believe is the connection between the image and the JSON file, above.
I loaded the JSON files into a section of a Frontier object database, and the names of the JPG files into another section. I wrote a script that looped over all the data from the JSON files, got the ID and then checked if there were images to go with that ID. 5816 image files existed and 533 didn't. Not bad, but far from perfect. Here's a list of IDs for which there were no images.
I haven't tried to generate a static site using this info, but it seems you could. It would be missing some images, about 9 percent. Even so, it would have been really nice if Flickr delivered it in that format. It's nice to have the JSON files, but most users won't be able to use them unless software is written, and they find it.