4chan thread- and image-downloaders

Tools for leeching from 4chan.

4cdl (ruby)

4cdl.rb grabs images from 4chan threads, maintaining new and original filenames. It uses the 4chan API.

The feature I wrote this for is the option to only download images from the thread that are newer than a specified image (i.e. the last image you already downloaded, but maybe deleted)!

Examples

Download all images from a thread that were not already downloaded. The state is saved in a file in $PWD/.
4cdl https://boards.4chan.org/hr/thread/2372933
Download images from a thread based on a regular expression applied to the uploader's filename:
4cdl https://boards.4chan.org/hr/thread/2372933 -i 'sunlight'
Good for /hr/.

usage

Usage: 4cdl [OPTION]... THREADURL
Grabs images from 4chan threads, maintaining new and original filenames.

Only new images are downloaded. The last known post is written to the status file "4cdl.status"

    -s, --[no-]sauce                 saves the raw HTML of the thread and URL in two additional files. This is the default.
    -i, --image=IMG_REGEX            download all images where the uploader's filename matches IMG_REGEX. This option disables the implicit --source.
        --debug                      print debugging output
    -h, --help                       display this help

4cdl (python)

4cdl.py is almost identical to the ruby version. See #4cdl (ruby) for a generic description.

Examples

Download all images from a thread that were not already downloaded. The state is saved in a file in $PWD/.
4cdl https://boards.4chan.org/hr/thread/2372933
Download all images from a thread that were uploaded after 1428092844737.jpg:
4cdl https://boards.4chan.org/hr/thread/2372933 \
	-m https://i.4cdn.org/hr/1428092844737.jpg
Download specific images from a thread:
4cdl https://boards.4chan.org/hr/thread/2372933 \
	-i https://i.4cdn.org/hr/1427645049883.jpg \
	-i https://i.4cdn.org/hr/1428092844737.jpg
Good for /hr/.

usage

usage: 4cdl [-h] [-s] [-m MINFILE] [-i IMAGE] [-I IMG_REGEX] THREADURL

Grabs images from 4chan threads, maintaining new and original filenames.

positional arguments:
  THREADURL             URL for a 4chan thread

optional arguments:
  -h, --help            show this help message and exit
  -s, --source          saves the raw HTML of the thread and URL in two
                        additional files. This is the default.
  -m MINFILE, --minfile MINFILE
                        load only images newer than MINFILE. MINFILE can be an
                        image url or the filename.
  -i IMAGE, --image IMAGE
                        download specific IMAGE from THREADURL. Can be given
                        multiple times. This option disables the implicit
                        --source. IMAGE can be an image URL or the filename.
  -I IMG_REGEX, --img-regex IMG_REGEX
                        Download all images where the uploader's filename
                        matches IMG_REGEX

4chandl

4chandl is written in bash and will break at the very moment someone at 4chan decides to change the HTML (again). But till then it downloads a complete thread, goes into a loop and loads new images as the thread changes. All new posts are written to the console.

This script uses wget for downloading and feh to view new images.

Good for fetching the posts of a dying thread of a busy board or for archiving a complete thread. Unfortunately wget ususally fucks up (the CSS yesterday, today the img-tags), so things have to be corrected manually.

And if DDOS protection strikes, change the user-agent in ~/.wgetrc.

4chanrn

4chanrn renames image based on the original filenames the uploaders used. A HTML-copy of the thread is required (see 4chandl).

The files are not really renamed, but hardlinked into the current directory. The “original” files can just be deleted after inspecting the images, until then the links can be re-created if you made an error.

Example

# download complete thread
4chandl https://boards.4chan.org/b/thread/503290493

# after it 404'd create hardlinks with proper filenames in a new directory
mkdir x
cd x
4chanrn ../503290493.html
As the filenames are taken from the HTML this script will break (again) if the structure of the HTML changes,

4c

4c is a really simple bash script to download any file (not just 4chan) with a given text in braces. It keeps the original name but adds a description.

Example

$ 4c https://www.example.com/originalfilename.jpg "demonstration"
$ ls
originalfilename (demonstration).jpg