---
<font size = 6><center><font face = garamond> <b> Mathematics in the context of AI</center>
<font size = 5><center><font face = garamond> <b> Markov Chain Sentence Generator</center>

---

Joaquin Carbonara 4/28/2024

---



## Introduction
This notebook demonstrates how to read a book from Project Gutenberg and use it to train a Markov chain model for generating random sentences. A Markov chain is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. In the context of language, it can be used to generate sentences that resemble the style of the source text.

In [1]:
# Importing necessary libraries
import requests
import random

In [2]:
# Function to download a book from Project Gutenberg
def download_book(url):
    try:
        response = requests.get(url)
        response.raise_for_status()
        return response.text
    except requests.RequestException as e:
        print(f'Error downloading the book: {e}')
        return None

# Function to build a Markov chain model from text
def build_markov_chain(text, chain={}):
    words = text.split()
    index = 1

    for word in words[index:]:
        key = words[index - 1]
        if key in chain:
            chain[key].append(word)
        else:
            chain[key] = [word]
        index += 1

    return chain

# Function to generate a random sentence
def generate_sentence(chain, count=15):
    word1 = random.choice(list(chain.keys()))
    sentence = word1.capitalize()

    for i in range(count-1):
        word2 = random.choice(chain[word1])
        word1 = word2
        sentence += ' ' + word2

    sentence += '.'
    return sentence

In [3]:
# Main function to execute the script
def main():
    # URL of a book from Project Gutenberg
    book_url = 'https://www.gutenberg.org/files/1342/1342-0.txt'  # Example: Pride and Prejudice by Jane Austen

    # Download the book
    book_text = download_book(book_url)

    # Check if the book was downloaded successfully
    if book_text:
        # Build the Markov chain model
        markov_chain = build_markov_chain(book_text)

        # Generate and print a random sentence
        random_sentence = generate_sentence(markov_chain)
        print('Generated Sentence:', random_sentence)
    else:
        print('Failed to download the book.')

# Run the main function
if __name__ == '__main__':
    main()

Generated Sentence: Eliza? Do not afraid you canât write; so accomplished! My poor sister. Before any message.


## Understanding Markov Chains in Sentence Generation
In this notebook, the Markov chain model is used to generate sentences. Each word in the generated sentence is based on the probability of that word following the preceding word, according to the data from the book used to train the model. This is a fundamental property of Markov chains where the next state (or word, in this case) depends only on the current state and not on the sequence of events that preceded it. The `build_markov_chain` function constructs this model by mapping each word to the list of words that follow it in the source text, thus creating a simple but effective Markov chain for text generation.


----

<font size = 5><center><font face = garamond> <b> Entropy of the value of a key word in the Markov Chain dictionary
</center>

----

In [21]:
import requests
from collections import Counter
import math

word='sly'

book_url = 'https://www.gutenberg.org/files/1342/1342-0.txt'  # Example: Pride and Prejudice by Jane Austen

# Download the book
book_text = download_book(book_url)

# Check if the book was downloaded successfully
if book_text:
    # Build the Markov chain model
    markov_chain = build_markov_chain(book_text)

In [30]:
from collections import Counter
words=markov_chain[word]

# Count the frequency of each word
word_counts = Counter(words)

# Calculate the probability of each word
total_words = len(words)
word_probabilities = {word: count / total_words for word, count in word_counts.items()}

# Calculate the entropy
entropy = 0
for probability in word_probabilities.values():
    if probability > 0:
        entropy += probability * math.log(1 / probability, 2)
max_ent=math.log(total_words,2)
print ("| entropy of", word,":",entropy,"| max_entropy :",max_ent,"| rel_entropy :",entropy/max_ent,"|")

| entropy of sly : 1.0 | max_entropy : 3.5849625007211565 | rel_entropy : 0.2789429456511298 |
