The biggest issue many users run into with these pre-packaged solutions is that they rely on Reddit's API. Apparently, this limits the user to scraping 1,000 posts. For example, in the documentation for
bulk-downloader-for-reddit
, they note that:
In some cases (e.g., specific Redditors with <1,000 posts), this should not be an issue. However, if you are scraping the contents of an entire subreddit that is medium-popular, you will definitely be affected by this limitation.
Some quick research indicates that using
might be the key to solving this problem. I have not yet researched this workaround very deeply, but it does appear to be a legitimate solution to the potential problem. Per the documentation:
I have not written anything applications integrating PushShift at this time, but I plan to do so relatively soon. In the meantime, it appears that another user has tried to publish a solution (which I have not tested):