The ambition
I needed a way to get into Python programming. I wanted to learn some of idiosyncrasies of the language and the best way to learn is to throw into a project. I frequent the content aggregator known as Reddit (I know, you've probably never heard of it before‽) but often when remote from an internet connection that I can use. Users of Reddit often find sharing .gifs of the animated variety reasonably entertaining, as do I. However a single animated .gif is usually between 1MB and 20MB's which will push me over my meager data limit rather quickly.
Now recently imgur, one of the popular image rehosting services for Reddit introduced a "gifv" service which allows users to upload a .gif and receive automatic conversion to webm or mp4 and the .gifv extension to the image will decide which of the 3 to use granted the users device and the size of the image.
This is fine and dandy, except a lot of users are stuck in the old .gif way despite webm or mp4 being supported in ALL major modern browsers. So users will mistakenly post links directly to imgurs hosted .gif rather then the gifv. Which can often reduce the filesize by up to 10x. So I click the link for the submission and wait for the gif to use 10-20MB's loading a clip with subtitled voice-overs from some film where an actor said something funny. There goes 1/30th of my bandwidth for the month.
The issue here is twofold. Either the user is unknowing of imgurs automatic gifv conversion or the user is arrogant and wants to stay with the archaic, proprietary image format from 1987. I can at least help with the first half.
Now to the problems
The idea here is to find posts that link to imgur but end in gif (without the v at the end.) And if the submitter could save imgur and users some bandwidth I post a very short message explaining that and providing them the link with the gifv. Reddit publishes their public api and is okay with people using bots on their site so this gives us our means. Now our method will be through the extremely helpful PRAWwrapper. PRAW really doesn't do much special but prevents myself from capturing the results from the reddit API and interpreting it, this probably saved hours. It also manages the limiting for the API for us so we do not have to think about that either.
After writing some code, and storing the results in sqlite database for persistence across startups, and keeping track of what submissions I've seen I was mostly read for a live test. I ran the system dry for some time, I would write to console the comment that would be posted on new submissions if the gif was from imgur.
What I noticed
- Some of the gifs people posted were so small that if you went to .gifv version image it would just redirect you to the gif anyway.
- Some of the gifs posted would let you go to the gifv page without redirection but would display the .gif anyway
- The .gifv landing page's javascript inlines the size of the original .gif and seems to display the webm/mp4 version if the original .gif is over 2MB's roughly.
Great, if I get redirected, don't bother posting as its small enough. The next problem was if it displayed the gif anyway. The best way to do this was just view the JSON they snuck into the .gifv page to show the size of the original .gif. After looking at 20 or so gifs I determined that if its under 2MB it usually will show the gif if above it will show webm or mp4.
Instead of getting picky with the size of the gif, such as posting on all .gif larger then 2MB I just posted on gif's above 5MB's as they are far worse offenders then 2MB gifs.
Great, now the comment is only posting on large .gifs. The next thing I noticed that even when the bot was only scraping around 15 subreddits I was getting dozens of hits a minute. The new account I created on Reddit wasn't able to post more then once every 10 minutes. The only way to beat that limit is to make legitimate upworthy points in order to get more comment karma. One problem I can see happening is people getting frustrated with the bot as it does seem a little intrusive to users. So I feel as if I may need to build up a lot of comment karma on that account to ensure the account doesn't dip into the 1 comment per 10 minutes range.
And this is why the bot isn't currently running. But I may give it a shot in the future. Perhaps increasing the limit to 10MB's will gather more favour from some Redditors and make managing karma easier.