Google Books allows viewing the scans in colour, but when I click the option to download the PDF, I am provided only with a black-and-white version.

Is it known how to obtain the original colour images, outside of inspectelementing each page one by one?

  • bela@lemm.ee
    link
    fedilink
    arrow-up
    2
    ·
    edit-2
    1 year ago

    I just spent a bit too much time making this (it was fun), so don’t even tell me if you’re not going to use it.

    You can open up a desired book’s page, start this first script in the console, and then scroll through the book:

    let imgs = new Set();
    
    function cheese() {    
      for(let img of document.getElementsByTagName("img")) {
        if(img.parentElement.parentElement.className == "pageImageDisplay") imgs.add(img.attributes["src"].value);
      }
    }
    
    setInterval(cheese, 5);
    

    And once you’re done you may run this script to download each image:

    function toDataURL(url) {
      return fetch(url).then((response) => {
        return response.blob();
      }).then(blob => {
        return URL.createObjectURL(blob);
      });
    }
    
    async function asd() {
      for(let img of imgs) {
        const a = document.createElement("a");
        a.href = await toDataURL(img);
        let name;
        for(let thing of img.split("&")) {
          if(thing.startsWith("pg=")) {
            name = thing.split("=")[1];
            console.log(name);
            break;
          }
        }
        a.download = name;
        document.body.appendChild(a);
        a.click();
        document.body.removeChild(a);
      }
    }
    
    asd();
    

    Alternatively you may simply run something like this to get the links:

    for(let img of imgs) {
    	console.log(img)
    }
    

    There’s stuff you can tweak of course if it don’t quite work for you. Worked fine on me tests.

    If you notice a page missing, you should be able to just scroll back to it and then download again to get everything. The first script just keeps collecting pages till you refresh the site. Which also means you should refresh once you are done downloading, as it eats CPU for breakfast.

    Oh and NEVER RUN ANY JAVASCRIPT CODE SOMEONE ON THE INTERNET TELLS YOU TO RUN

    • antonim@lemmy.dbzer0.comOP
      link
      fedilink
      arrow-up
      1
      ·
      1 year ago

      Well, I may be technologically semi-literate and I may have felt a bit dizzy when I saw actual code in your comment, but I sure as hell will find a way to put it to use, no matter the cost.

      You’re terrific, man. No idea what else to say.