How to Grab Images from Clickpic

A Summertown artist recently asked me to build her a WordPress site to show off her paintings. As with the other sites I’ve blogged about, I installed WordPress for her at Mythic Beasts, discussed the design of the site and chose a theme, and built a prototype showing the pages we thought we’d need together with some sample pictures showing how we’d display the art. And I then had to put the art itself on the site.

Penny didn’t have up-to-date saves of her images at home, but she had set up an account at Clickpic, a service which enables photographers to build their own sites by filling in templates and uploading photos. Or paintings, in Penny’s case. She had set up four galleries: Chinese Brush, Landscape, Portrait, and Impressionist. Each gallery occupied two pages of thumbnails (one with about fifty, and an overflow with about twenty), so altogether, there were eight pages of thumbnails.

Of course, the thumbnails on their own aren’t much use. But some poking around showed that each thumbnail had the suffix _thumb in its name, and also had a full-size counterpart with the same name except for the _thumb. So I knew where everything was, which meant I should be able to download it.

This arrangement was regular enough that it was definitely worth writing a program, rather than doing something like 280 downloads manually. So I wrote the PHP script below, which I’m reproducing in case it’s of use to anyone else. I’ve commented it, so I don’t think I need to explain much else. The only thing I’ll mention is that I based it on S.Visser’s answer to the Stack Overflow discussion scraping all images from a website using DOMDocument. DomDocument is a data structure which represents HTML pages in a way that makes it easy to (for example) loop over them and inspect all the images and their URLs. Which is what I’m doing below.

<?php

/* grab_images.php */


/*
Grabs the images in each of the four galleries
on Penny's account at http://farm7.clik.com/grantpgw/ .
These are named Chinese Brush, Landscape,
Portrait, and Impressionist. Puts them into
corresponding subdirectories here.
*/


// Copy the image at $url into a file
// with the same basename in the subdirectory
// $dir .
//
function grab( $url, $dir ) 
{
  global $grabs;
  // For recording grabbed images in
  // the order I grab them.
  // I append an IMG for the image to this,
  // followed by its name.

  $image =  file_get_contents( $url );
  // Image contents.

  if ( $image === FALSE ) {
    echo "Can't find image " . $url;
    return -1;
  }

  $parsed = parse_url( $url );
  // URL split into its components.

  $path = $parsed[ 'path' ];
  // The file path.

  $saveto = $dir . '/' . basename( $path );
  // The last component of the path, i.e.
  // the basename.

  file_put_contents( $saveto, $image ); 
  // Copy the image there.

  $grabs = $grabs . 
           "<img src=" . $saveto . ">\n<BR>" .
           $saveto . "\n<BR><BR>"; 
  // Append the image and its name
  // to the record of images grabbed.  
}


// Copy all images from the gallery on
// $page_url into subdirectory $dir .
//
function grab_gallery( $page_url, $dir )
{
  $html = file_get_contents( $page_url );
  // Get the HTML of the page at $page_url .
  // This is one of Penny's gallery pages.

  $dom = new domDocument;
  $dom->loadHTML($html);
  $dom->preserveWhiteSpace = false;
  // Convert to DOM: see 
  // "scraping all images from a website using DOMDocument" ,
  // http://stackoverflow.com/questions/15895773/scraping-all-images-from-a-website-using-domdocument

  $images = $dom->getElementsByTagName('img');
  // Get a list of all images on the page.

  foreach ($images as $image) {
    $src = $image->getAttribute('src');
    $src_big = str_replace( "_thumb", "", $src );
    echo $src . "\n";
    grab( $src, $dir );
    echo $src_big . "\n";
    grab( $src_big, $dir );
  }
  // Loop over the list, saving the thumbnail
  // and its corresponding full-size image.
  // By inspection, I see the latter have the same
  // name as the former but without '_thumb'.
}


grab_gallery( "http://farm7.clik.com/grantpgw/gallery_378821.html"
            , "images/chinese_brush"
            );

grab_gallery( "http://farm7.clik.com/grantpgw/gallery_378821_2.html"
            , "images/chinese_brush"
            );

grab_gallery( "http://farm7.clik.com/grantpgw/gallery_196828.html"
            , "images/landscape"
            );

grab_gallery( "http://farm7.clik.com/grantpgw/gallery_196828_2.html"
            , "images/landscape"
            );

grab_gallery( "http://farm7.clik.com/grantpgw/gallery_195771.html"
            , "images/portrait"
            );

grab_gallery( "http://farm7.clik.com/grantpgw/gallery_195771_2.html"
            , "images/portrait"
            );

grab_gallery( "http://farm7.clik.com/grantpgw/gallery_195453.html"
            , "images/impressionist"
            );

grab_gallery( "http://farm7.clik.com/grantpgw/gallery_195453_2.html"
            , "images/impressionist"
            );
?>

Leave a Reply

Your email address will not be published. Required fields are marked *