Fantasy Name Generator

A couple of weeks ago, while I was reading the second book of the Mistborn saga by Brandon Sanderson, I stumbled upon a strange name: Tathingdwen. I asked myself how could someone come up with these names and if I could automate the thing.

And that’s why I fell into the rabbit hole of name generators. There are various approaches on the web, which basically can be summarized as the following ones:

setting a bunch of syllables and combine them randomly
using a Markov chain model
using deep learning techniques focused on NLP

I experimented a bit using an LSTM and customizing the code from this webpage. However I wanted to get my hands dirtier and I tried another approach: genetic algorithms.

Genetic algorithms

The basic idea is to use characters in strings as genes and swap portions of the parents’ via crossing-over. Moreover we can add character/gene mutation to the equation.

Obviously we need an initial population, which I found online (Kismet’s Fantasy Name Compendium).

I preprocessed the input in order to obtain another file with a single name for each line:

import csv

names = []
with open("namedb.csv") as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=",")
    next(csv_reader)  # skip first line with column names
    for row in csv_reader:
        for name in row:
            if name != "":
                names.append(name)

with open("names.txt", "w") as f:
    for word in names:
        f.write(word + "\n")

As “engine” for the genetic algorithm, I used PyGAD. The last thing I needed was to devise the fitness function.

The basic idea for the fitness function (used to select the best parents to mate at each generation) is to favor names with a good balance of consontants and vowels and add an (optional) bias for the word length. This is the equation I constructed:

$$fitness = 1/abs({vowels - consonants} + bias) + lengthbias * wordlength$$ Where $vowels$ is the number of vowels and $consonants$ is the number of consonants.

A bias of $0.000000001$ has been used to prevent division by $0$

This is the code for the fitness function:

alphabet = string.ascii_lowercase + "'"
vowels = ["a", "e", "i", "o", "u"]
consonants = list(filter(lambda c: c not in vowels, alphabet))
bias = 0.000000001

def fitness_function_factory(length_bias=1):
    vowels_ord = [ord(c) for c in vowels]
    consonants_ord = [ord(c) for c in consonants]
    alphabet_ord = [ord(c) for c in alphabet]

    def fitness_function(solution, solution_idx):
        num_vowels = len(list(filter(lambda c: c in vowels_ord, solution)))
        num_consonants = len(list(filter(lambda c: c in consonants_ord, solution)))
        word_length = len(solution)
        zeros_length = len(list(filter(lambda c: c == 0, solution)))
        # maximize values for good equilibrium of consonants and vowels and takes into account word length
        fitness = 1 / (abs(num_vowels - num_consonants) + bias) + length_bias * word_length
        return fitness

    return fitness_function

We also need a couple of helper functions in order to transform the initial population in an array of int (PyGAD handles only int and float values) and to transform the results back into human readable strings.

# taken from https://stackoverflow.com/a/32043366/1262118
def numpy_fillna(data):
    # Get lengths of each row of data
    lens = np.array([len(i) for i in data])

    # Mask of valid places in each row
    mask = np.arange(lens.max()) < lens[:, None]

    # Setup output array and put elements from data into masked positions
    out = np.zeros(mask.shape, dtype=int)
    out[mask] = np.concatenate(data)
    return out

def convert_string_to_integers(s):
    return [ord(c) for c in s]


def convert_integers_to_string(i):
    removed_zeros = filter(lambda i: i != 0, i)
    return "".join([chr(v) for v in removed_zeros])


def get_initial_population(filename):
    words = read_data(filename)
    words_ord = [convert_string_to_integers(word) for word in words]
    filled = numpy_fillna(words_ord)
    return filled

Finally we can build the PyGAD instance, run it and collect the result, filtering the new names from the final pupulation by removing the existing ones:

def get_random_solutions(num, solutions, seed=42):
    random.seed(seed)
    length, i = solutions.shape
    idx = np.random.randint(length, size=num)
    selection = solutions[idx, :]
    words = [convert_integers_to_string(solution) for solution in selection]
    return words


def write_new_names_to_disk(names, oldnames, filename="new_names_genetic.txt"):
    with open(filename, "w") as f:
        for name in names:
            if not name in oldnames:
                f.write(name + "\n")


initial_pop = get_initial_population("names.txt")
pop_length, _ = initial_pop.shape
num_parents_mating = math.floor(0.75 * pop_length)

ga_instance = pygad.GA(
    num_generations=3,
    fitness_func=fitness_function_factory(length_bias=0),
    num_parents_mating=num_parents_mating,
    mutation_type="random",
    mutation_probability=0.15,
    initial_population=initial_pop,
    crossover_type="single_point",
    crossover_probability=0.7,
    # mutation_percent_genes=(12, 8),
    gene_type=int,
)

ga_instance.run()
sols = get_random_solutions(200, ga_instance.population)
old_names = read_data("names.txt")
write_new_names_to_disk(sols, old_names)

Here are some of the resulting new names:

Radel`in
Harzula
Jantian
Ralzimaa
Mxmola
Aodog
Parambar
Samssa
Jorodn
Oeta
Sdmeji
Qasha
Obtovene

You can find the whole code here.

2022-05-25

https://darthvi.com/post/fantasy-name-generator/ DarthVi