Scraping a Twitter Account's Images Using TWINT

This was a request for a friend getting used to TWINT. Should be useful for others as well.
If you don't already know, TWINT, is one of the best Twitter scrapers out there and it doesn't require an API key. I'll save you all the details as the following people have written extensively about the tool with video and podcasts, which I've included at the bottom.
https://github.com/twintproject

Step 1: Get Twint set-up and ready

Null-byte's tutorial can get you to the point where you run:

twint --help

And you'll get the helpful dialog popup with all the commands available.

Step 2: Point at your target account and scrape all the images, saving all the data to a CSV file

Our target account

twint -u visiteroda --images -o visiteroda_images.csv --csv

This command is saying, look up user 'visiteroda', go to the images, export all of that output data to a new file that will be created called 'visiteroda_images.csv' and make sure to format that data as a CSV.

Now if we check our home directory, we can see that file. You can open this up in LibreOffice.

Step 3: Format the 'photos' column so we can mass download the images

In LibreOffice we can see that all of the direct links to the images are posted. However, there is unecessary padding with a pair of brackets and apostraphes. We will write a simple excel function to get the sub-string that we need.

['https://pbs.twimg.com/media/EKkbRwEXUAEhc_g.jpg']

=MID(N2, 3, 47)

Each of these Twitter image URLs are exactly 47 characters long. We need to remove the first two characters at the front and end of this string. The =MID function in Excel will take care of this. First we need to create a new column, adjacent to the cell that we want to clean. We will paste this formula. The function will start at the third charcter, and keep the 47 next chars to give us the URL that we want. After you copy it over for one cell, drag the bottom right of that cell all the way to the bottom to copy the formula for the entire column. Our columns are shown below:

Step 4: Paste URLs into a new txt file and we can download from terminal

You probably want to do this in a new folder. Paste the clean URLs into a txt file and save that file. Then open a terminal in this location. Paste the following command to see your images download automatically into the folder.

wget -i images.txt

Voila! You have all of the images in your new folder!

Before wget command
After wget command

Some Use Cases:

  • Quick download of an account's images - saves you a lot of "save as" clicks
  • Mass reverse image search - it can be tasking to drag and drop images from indivual tweets
  • Stego and image analysis - you can now feed these images to any other image processing programs that you have

Resources:

https://null-byte.wonderhowto.com/how-to/mine-twitter-for-targeted-information-with-twint-0193853/
http://osintpodcast.com/209598/1373113-3-twint-an-osint-tool-for-collection-on-twitter-at-scale?play=true

Subscribe to This Week in Data

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe