May Bot

So, After seeing a lot of examples of chat bots on twitter (And considering the number of them) I decided to write my own. The bot is designed to mirror my tweeting style as well as my old tweets to allow them to be as close to my style as possible. This project uses Python 3. You need pip3 installed and the Tweepy Module.

Getting the Data

Downloading the past tweets from twitter is the best method of collecting the tweets rapidly. To request your archive go to the account settings on twitter and scroll down to request your archive. Once the email arrives download the data and open it.

The data is presented with lots of filler to make it look like the twitter interface but all we need is the tweets.csv file. The CSV file contains a lot of fields we don't need (At least, for this twitter bot) so open it in an editor like Excel or LibreOffice Calc. From here we select the column titled "text".

Save the file as a .txt file and name it something like tweets.txt and open it in a word editor. From here there should be a single line column of the tweet bodies. By hand or using RegEx (Really, You should use Regex. Going through 1000+ tweets will be brutal by hand) tweets that should be removed are:

  1. Tweets with @ names as twitter has issues with bots that tweet at people without them already following the bot.
  2. RTs, which won't be your words
  3. Tweets that are only 1 or 2 words long. This helps make the bot input more lengthy and have better choices of words.
  4. Punctuation can be glitchy but YMMV.

Once this is done add 'BEGIN NOW' to the front and 'END' to the end. This allows the bot to see the tweets as input. For Example:

BEGIN NOW My hormones are to powerful for you traveller! END

Actual coding time!

Now that the text is formatted we can work on the python script. The python script is used to convert the text input into a pickle file.

But May, what is Pickling?

Glad you asked, Pickling is the process of preserving by placing food in vinegar serializing data to allow it to be stored, ordered and reused later easily. This script converts the text data to trigrams so we can use them later for making out tweets/shitposts.

Make a new file and call it "". This makes it a python script and allows it to run when we need it too. First, we add all the basic stuff we need.

import pickle tweets = open("Tweets.txt","r") chain = {}

This assigns tweets to the text file for easy access and allows us to read it with the "r" tag. Chain is also set as a blank dictionary type to allow it to save the data we make in this script and output it later. We now need to add the first segment of code to make the trigrams

The bot uses trigrams to generate sentences. Each starts with BEGIN NOW and word after which is used in the text input. Using this we randomly choose which words we want to use.

def generate_trigram(words): if len(words) < 3: return for i in range(len(words) - 2): yield (words[i], words[i+1], words[i + 2])

This most likely won't make much sense right now but this section converts each line into a trigram. The bulk of the code is in the main section which runs through the txt file and converts it into a chain.

for line in tweets.readlines(): words = line.split() for word1, word2, word3 in generate_trigram(words): key = (word1, word2) if key in chain: chain[key].append(word3) else: chain[key] = [word3]

So line by line,

  • For each line in the tweet read it into the script.
  • Split the line into an array of words along spaces.
  • We pass the words into the generate trigrams section above and yield the last section as Word1, Word2 and Word3.
  • Set the key to be word1 and word2.
  • If the key is in the chain then.
  • Link it to word 3.
  • If not then.
  • Set word 3 as the key.

This might be a bit complex for words, It's hard to visualise but it's sort of like a graph of choices. Each word is linked to what the next possible word could be from past uses of words.

For example, we can use a simple sentence to visualise this:

"The Cat Sat On The Mat"

"The Rat Sat On The Bat"

This is a very simple version of what we're making as ours might have hundreds of thousands of choices, depending on amount of input. Now we can add the final line to the pickler before testing it:

pickle.dump(chain, open("chain.p", "wb"))

This dumps the pickled script and saves it as chain.p. Run this script from command line by navigating to the file and typing "python3". Hopefully there should be no errors and no output. Check the file system for a new file called chain.p. You can open it but it is mostly jumbled words and letters. Congrats! Part 1 is done. Now for generating text!

The Markov Chainer

Create a new file called "". This is the file that actually tweets and forms the sentences using the chain file.

import tweepy,pickle,random,time def markov(): chain = pickle.load(open("chain.p", "rb")) tweet = [] sword1 = "BEGIN" sword2 = "NOW" while True: sword1, sword2 = sword2, random.choice(chain[(sword1, sword2)]) if sword2 == "END": break tweet.append(sword2) fintweet = " ".join(tweet) return fintweet

The first line handles the imported files. We need tweepy to handle calling the API, Pickle to load the chain file, random to randomly pick words and time to allow the script to post in intervals. The first def we add is to actually call the markov chain. We set up the tweet and the first two words, sword1 and sword2, to allow the markov chain to form from the words we have in the chain.

The script sets sword1 as sword2 and sword2 as a random choice using the chain and past words to pick the next. This repeats until sword2 is END which breaks the loop, Otherwise the script adds sword2 to the tweet. When it breaks, the tweet is joined with spaces between each word. This is returned as fintweet when the process is called.

The next section handles the API Call.

def get_api(cfg): auth = tweepy.OAuthHandler(cfg['consumer_key'], cfg['consumer_secret']) auth.set_access_token(cfg['access_token'], cfg['access_token_secret']) return tweepy.API(auth)

These lines return the authentication when it is called in the main script. It reads in the cfg, a dictionary we set up in the main body below, and returns the authentication from twitter, allowing it to post.

The final code block checks the length, prints the tweet and sets the bot to sleep for a certain amount of time.

def main(): cfg = { "consumer_key" : "", "consumer_secret" : "", "access_token" : "", "access_token_secret" : "" } api = get_api(cfg) while True: fintweet = markov() if len(fintweet) < 140: break status = api.update_status(status=fintweet) time.sleep(3600) main() if __name__ == "__main__": main()

First we have the details that need to be filled in. To fill these in you need to go to dev.twitter for the account the bot is going to be posted on. This link goes to the bot application page. Fill in the data and once complete it will show you a page for the bot. Go to keys and access tokens to get the first two and click on the button below labelled "Generate Access token" for the last two. Copy these into the cfg dictionary to allow the bot to tweet.

The api variable is set as the returned status of the get_api section. After this the bot repeatedly runs def markov until a tweet shorter than 140 characters is made. It then breaks and sends it to twitter. After this the bot rests for an hour, or 3600 seconds. This is to avoid twitter from banning the bot for spamming by posting to frequently.

The bot is essentially finished. At this point running the script should post tweets to twitter, showing it works.

But my bot isn't tweeting?

If the bot doesn't work there are some things that should be checked, mainly: That the API key's are correct, That the tweet's are definitly below 140 characters (Important if you add text to the tweet (Like early may ebook tweets)), that the correct modules are added and that the pickler and chain files are all correctly arranged (They should all be in the same folder together).

But my tweets aren't that good?

Try removing some of the tweets that come up often. Change the status call to print(fintweet) and dryrun the bot to see which tweets and fragments come up unchanged. Removing these can help add more unqiue content. As well as this, removing certain tweets (like those below 20 characters) can help lengthen the generated tweets. Finally, Tweak the code! Mine is by no means perfect (or even good...) so improve it!


There we go, bot tutorial that took me 2 days to write. I hope it was good and helped you! If you have questions, comments or corrections my twitter is here but you probably knew that because not many people will really see this. You can seem mayebooks and other bots running there. I hope this helped!


My "Ebooker", Software to streamline the process is available on github here! Click Here!