Mapping seeker journeys: A technical deep dive


Abstract digital globe with green swirls and data points, representing global connectivity and data visualization.

Overview

With a focus on evangelism, CV is deeply passionate about understanding the faith journeys of individuals all over the globe who respond to our evangelistic initiatives.  
In this technical deep dive, we walk you through the AI techniques and data engineering practices we used to convert over 1,000 raw testimonies from seekers into a structured dataset that offers insight into typical pathways to faith, currently used to inform some of our on-the-ground and digital evangelism work.

Approach and Methodology

1. Data preparation, standardisation, and information extraction

The initial task was to source data of testimonies from various disparate sources within our organisation and blend them into a single dataset. These anonymised testimonies find their way into CV's repositories through reporting mechanisms linked to our evangelistic initiatives. This process involved the careful standardisation of fields like the country of mission and people group, ensuring that the data was ready for advanced processing. This crucial preprocessing phase was accomplished using Python libraries for data manipulation.

Additionally, a key aspect of this phase was the extraction of specific information regarding digital media involvement in the seeker's life-changing events. We utilised a locally hosted LLaMA 2 model, running on Mac M-series hardware, to identify and extract the names of digital platforms mentioned, such as Facebook, WhatsApp, and Phone, which were further standardised across the dataset to ensure uniformity.

2. Summarisation with generative AI

Each testimony was then assigned a unique identifier. Using the Mixtral 7x8B, a state-of-the-art large language model hosted on a secure server, we employed generative AI to summarise each testimony. The goal was to distill complex, unstructured narratives into concise summaries that highlighted key life-changing events. Modern prompt-engineering techniques were applied to achieve the most accurate and insightful summaries. A portion of these summaries underwent manual validation to ensure the AI's outputs were both reliable and meaningful.

3. Identifying events using topic modelling

Summarised key events across all testimonies were clustered based on semantic similarity. We used a state-of-the-art algorithm called fast clustering, which efficiently grouped similar event summaries, even if worded differently. This step was pivotal in identifying common event patterns across the testimonies. We used topic modelling, which is essentially analysing each cluster (aided by AI) and assigning unique and meaningful event names to them, like "Encountered the Gospel", "Baptism," “Discipleship” etc., each representing a milestone in the faith journey of the seeker.

4. Event classification and tagging with AI

With an inventory of categorised events, the same AI model used for summarisation was employed to classify and tag each summary point with the appropriate event name—a process known as topic classification (assigning labels to different sections of text based on their content). Again, modern prompt-engineering techniques were used to ensure the tagging was both accurate and contextually appropriate. Manual validation and correction of these tags further enhanced the quality of the dataset.

5. Visualisation and analysis

The culmination of this process was a curated dataset, structured in CSV format, that included testimony IDs, countries, people groups, digital platform information, and a sequenced list of life-changing events for each testimony. To bring this data to life, we utilised Power BI (a tool for creating visual reports and dashboards) to perform advanced visual analytics. The events and their transitions were visualised using network diagrams and sequence flow charts (visual representations that show the relationships and order of events), providing an intuitive and powerful representation of the commonalities and unique aspects of each faith journey.

The culmination of this process was a curated dataset, structured in CSV format, that included testimony IDs, countries, people groups, digital platform information and a sequenced list of life-changing events for each testimony. To bring this data to life, we utilised Power BI to perform advanced visual analytics. The events and their transitions were visualised using network diagrams and sequence flow charts, providing an intuitive and powerful representation of the commonalities and unique aspects of each faith journey.

Conclusion

This approach not only transformed raw, unstructured testimonies into a structured and insightful dataset but also provided a scalable methodology for future projects involving large-scale text analysis, information extraction, and pattern recognition using AI. The Power BI report below provides an overview of some of the conclusions of this work. 

If you want to contribute to this testimony database, please reach out.

To read more on this, go to:

Featured Article

LLMs in evangelism: balancing risk and opportunity