The feature I wrote this for is the option to only download images from the thread that are newer than a specified image (i.e. the last image you already downloaded, but maybe deleted)!
4cdl https://boards.4chan.org/hr/thread/2372933Download images from a thread based on a regular expression applied to the uploader's filename:
4cdl https://boards.4chan.org/hr/thread/2372933 -i 'sunlight'Good for /hr/.
Usage: 4cdl [OPTION]... THREADURL Grabs images from 4chan threads, maintaining new and original filenames. Only new images are downloaded. The last known post is written to the status file "4cdl.status" -s, --[no-]sauce saves the raw HTML of the thread and URL in two additional files. This is the default. -i, --image=IMG_REGEX download all images where the uploader's filename matches IMG_REGEX. This option disables the implicit --source. --debug print debugging output -h, --help display this help
4cdl https://boards.4chan.org/hr/thread/2372933Download all images from a thread that were uploaded after 1428092844737.jpg:
4cdl https://boards.4chan.org/hr/thread/2372933 \ -m https://i.4cdn.org/hr/1428092844737.jpgDownload specific images from a thread:
4cdl https://boards.4chan.org/hr/thread/2372933 \ -i https://i.4cdn.org/hr/1427645049883.jpg \ -i https://i.4cdn.org/hr/1428092844737.jpgGood for /hr/.
usage: 4cdl [-h] [-s] [-m MINFILE] [-i IMAGE] [-I IMG_REGEX] THREADURL Grabs images from 4chan threads, maintaining new and original filenames. positional arguments: THREADURL URL for a 4chan thread optional arguments: -h, --help show this help message and exit -s, --source saves the raw HTML of the thread and URL in two additional files. This is the default. -m MINFILE, --minfile MINFILE load only images newer than MINFILE. MINFILE can be an image url or the filename. -i IMAGE, --image IMAGE download specific IMAGE from THREADURL. Can be given multiple times. This option disables the implicit --source. IMAGE can be an image URL or the filename. -I IMG_REGEX, --img-regex IMG_REGEX Download all images where the uploader's filename matches IMG_REGEX
Good for fetching the posts of a dying thread of a busy board or for archiving a complete thread. Unfortunately wget ususally fucks up (the CSS yesterday, today the img-tags), so things have to be corrected manually.
And if DDOS protection strikes, change the user-agent in ~/.wgetrc.
The files are not really renamed, but hardlinked into the current directory. The “original” files can just be deleted after inspecting the images, until then the links can be re-created if you made an error.
# download complete thread 4chandl https://boards.4chan.org/b/thread/503290493 # after it 404'd create hardlinks with proper filenames in a new directory mkdir x cd x 4chanrn ../503290493.htmlAs the filenames are taken from the HTML this script will break (again) if the structure of the HTML changes,
$ 4c https://www.example.com/originalfilename.jpg "demonstration" $ ls originalfilename (demonstration).jpg