Soft Skills

Quantitative Evaluation of Soft Skills Assessment

By WiseWorldJuly 17, 2024

Soft skills are essential in AI-driven workplaces, but measuring them is challenging. This study evaluates WiseWorld, an AI-powered platform assessing all 44 soft skills through story-based interactions. Data from 66 participants, focusing on 52 active users, showed moderate reliability (CV range of 70 to 100) and consistent performance. With an average playtime of 19 minutes and retention rates of 63% on day 1 and 41% on day 3, WiseWorld effectively engages users and provides meaningful skill evaluations. The findings position WiseWorld as a valuable tool for workforce development in an AI-driven economy.

Abstract

Soft skills are increasingly vital in AI-driven workplaces, yet their effective measurement remains challenging. This study examines WiseWorld, an AI-powered social simulation platform that assesses all 44 soft skills using dynamic, story-based interactions across six categories. Data were collected from 66 participants, focusing on 52 active users engaged in at least two sessions. The reliability of WiseWorld's assessments was evaluated using the Coefficient of Variation (CV). The analysis revealed moderate variability across different session scenarios, with a CV range of 70 to 100, indicating consistent performance. Engagement metrics - averaging 19 minutes of daily playtime with day-1 and day-3 retention rates of 63% and 41%, respectively - demonstrate the platform’s strong capacity to maintain user interest and yield meaningful skill evaluations. These findings position WiseWorld as a scalable and innovative tool that redefines how organizations measure and develop essential soft skills. It has broad implications for workforce development, offering data-driven insights to address skill gaps and enhance employee readiness in an AI-driven economy.

Introduction

As workplaces increasingly rely on AI, evaluating soft skills - critical competencies like adaptability and communication - has become a significant challenge. The demand for human-centric skills has surged in industries where AI automates technical tasks. Studies indicate that organizations with high soft skills proficiency see improved team performance and innovation rates. Traditional methods, such as structured interviews or multiple-choice questionnaires, are often limited in providing genuine insights. Research indicates that gamification can significantly sustain user interest and productivity, making it a promising approach for soft skills assessment (Altomari et al., 2023).

WiseWorld stands out by harnessing AI to replicate real-world interactions and evaluate user decisions across all 44 soft skills. The platform's distinct features include:

AI Narrator: The platform features one AI storyteller - akin to a Dungeons & Dragons–style “Dungeon Master” - who adapts each scenario in real time to reflect user decisions. This personalized approach ensures that every participant encounters a unique, customized journey based on their choices and behaviors.
Unbiased, Context-Aware Assessments: AI objectively analyzes user decisions, aligning them with situational appropriateness and skill relevance. The WiseWorld method can eliminate interviewer bias and minimize test preparation effects, providing a fair and accurate evaluation of soft skills.

WiseWorld engages users with an open-world, story-based framework that leverages gamification principles. The map-based interface guides users through daily narrative touchpoints, offering three diverse daily challenges. These life-like episode scenarios evolve based on user actions, ensuring authentic engagement and accurate evaluation of soft skills, sparking excitement about the platform's potential.

Sample Data

The study recruited 66 participants from three distinct sectors:

A coach training institute in the education domain,
A fintech company, and
An online healthcare platform that connects users to doctors via a mobile application.

This diverse group provided a wide range of professional backgrounds and skill levels, helping to capture robust interactions with AI-driven and gamified environments. Of the 66 individuals who initially signed up, 52 were considered “active” users - defined here as those who completed at least three episodes within a month. Focusing on these 52 active users ensured sufficient interaction data for more reliable analysis, striking an optimal balance between sample size and data integrity.

User Journey in WiseWorld

User Journey in WiseWorld: Step-by-Step

Personalize Goals and Avatar (First Sign-In): The user or a manager sets the primary objectives for skill development and then creates a customized avatar (realistic or fictional) to reflect those goals.
Choose a Location: On the interactive map, select a spot where the challenge aligns with user-stated goals. Ensure that each scenario directly supports user skill-development objectives.
Immerse the user in the Story: View three to five narrative slides that set the scene and describe the main challenge.
Make a Decision: Respond to prompts like "What will you do?" and choose your actions.
Chat with the AI character: Users engage in a dynamic conversation with the AI storyteller to share user decisions or ask questions.
The consequence of the User’s choice: The AI narrator updates the story based on the user’s decisions, revealing immediate consequences and guiding the user to the next narrative stage.
Receive Skill Scores: From a pool of 44 possible soft skills, users receive scores (ranging from –5 to +5) on the 3–7 skills most relevant to their decisions. Each score is explained by how the user's choices demonstrated - or lacked - those targeted skills.
Review Feedback: Receive concise, AI-generated insights that highlight user strengths, pinpoint areas for improvement, and offer micro-learning strategies for similar future situations.

Upon entering WiseWorld, users are invited to create a character that can mirror their real-life persona or an entirely fictional identity. This character-creation process is designed to enhance the gamification aspect of WiseWorld. However, from a soft-skills perspective, the specific identity users adopt - realistic or imaginary - does not affect their proficiency. Whether they play as themselves or as fictional characters, the AI-driven assessment focuses solely on behaviors and decisions made in the simulation. In simpler terms, it does not matter whether users role-play as themselves or a fictional character - soft skills are measured the same way regardless of the persona they choose. Drawing inspiration from role-playing games like Dungeons & Dragons (D&D), WiseWorld allows users to navigate interactive sessions categorized into three domains: work, life, and hobbies based on user goals.

Each interactive episode begins with users selecting one of up to three locations on a map, each presenting a unique challenge. Upon selecting a location, the narration unfolds through 2-3 slides, detailing the situation, the involved parties, and the main challenge. After the narration, users are prompted with questions such as, "What would you do now?" Their responses initiate AI-driven interactions, continuing for at least five rounds. The AI evaluates these responses based solely on the quality of the user’s decision, determined by how well it aligns with the target soft skill. Post-decision analysis categorizes user performance across all 44 soft skills, displayed on a PowerWheel - a radar chart highlighting strengths and improvement areas. By visualizing scores intuitively, the PowerWheel enables users to quickly identify their best-performing soft skills and areas requiring development, offering a clear roadmap for personal growth.

Scores range from -5 to +5, reflecting user interactions and decisions. Positive scores indicate alignment with the target soft skill - for instance, a well-crafted response showcasing deductive reasoning would result in a higher score. Negative scores (excluding 0) signify a lack of use and actions contrary to the skill - for example, denying another's feelings instead of showing empathy. Frequent and high-quality interactions improve the precision of evaluations, while limited interactions challenge the tool’s ability to provide comprehensive insights.

Feedback Mechanism in WiseWorld

After each episode, after the user engages in dialogue with the AI acting as a Dungeons & Dragons–style game master, the system transitions from interactive storytelling to reflective feedback. Once the user’s decisions have shaped the narrative, the AI narrates the story’s outcome based on those decisions. Following this narration, the AI provides personalized feedback, highlighting the strengths and areas for improvement in the user’s soft skills, as demonstrated during the episode.

This immediate, individualized feedback serves multiple purposes:

Reinforcement and Reflection: It reinforces positive behaviors and provides constructive criticism, helping users understand how their choices align with adequate soft skills.
Guidance for Future Actions: Users can apply this feedback in subsequent episodes to improve decision-making and soft skill application.
Enhanced Learning Experience: By offering tailored insights right after each interaction, the platform supports continuous learning and self-awareness, making the assessment more impactful and actionable.

Methodology

Research Design

This study employed a quantitative framework to evaluate the reliability of WiseWorld’s soft skills assessments. The primary metric chosen was the Coefficient of Variation (CV), valued for its ability to standardize score variability and facilitate comparisons across diverse user interactions. CV is particularly suitable for applications where score variability scales proportionally to the mean, ensuring reliability across diverse datasets (Shechtman, 2013). This adaptability makes CV an effective measure for evaluating consistency in dynamic, user-driven environments like WiseWorld. (Springer)

Why CV?

The Coefficient of Variation (CV) provides a robust method for assessing consistency across WiseWorld's dynamic episode scenarios. Given the platform's varied interactive sessions - each with different contexts, challenges, and targeted soft skills - CV’s adaptability is particularly suitable for an interactive, user-driven framework.

Key advantages include:

Standardization across episode scenarios: CV measures variability relative to the mean, enabling meaningful comparisons across diverse episodes and skill categories without bias from absolute score differences.
Dimensionless Nature: The CV, a standardized measure, ensures fair comparisons across datasets of varying scales, making it ideal for assessing consistency.
Sensitivity to Data Distribution: CV effectively captures relative dispersion even when data distributions deviate from normality, which is common in user-driven platforms like WiseWorld.

By employing CV, the study ensures that reliability assessments remain consistent and comparable across various episode scenarios, providing a precise, objective measure of the platform's performance.

Data Collection Tools

WiseWorld evaluates 44 soft skills through user interactions within its gamified episode scenarios. Scores for each skill are calculated based on two primary factors:

Alignment with the Target Soft Skill: For example, a decision demonstrating leadership aligns with the session scenarios requiring delegation skills.
Contextual Fit: Responses are assessed based on how well they integrate into the ongoing narrative and contribute meaningfully to resolving the presented challenge. This includes evaluating the relevance of the response to the scenario's specific context and the logical progression of actions taken by the user within the story.

Scores range from -5 to +5, where:

Positive scores signify successful engagement with the target skill, reflecting actions that effectively contribute to overcoming the challenge.
Negative scores reflect missed opportunities or actions that detract from resolving the challenge, indicating areas where the soft skill was not effectively demonstrated.

Interaction Quality and Frequency: In the WiseWorld version V0, users engaged in chat-based interactions with an AI narrator (akin to a D&D game master), with each episode allowing up to five responses. High interaction was thus defined by users who reached the maximum of five responses, effectively completing the episode’s dialogue. This level of engagement provided richer data for skill evaluation. Notably, 57% of participants met this five-response threshold, indicating robust user involvement. Conversely, those exiting the episode contributed fewer data points, reducing the accuracy of their soft skill assessment. Ensuring a minimum number of meaningful interactions remains essential for maintaining the reliability and validity of evaluation outcomes.

With the CV methodology established, the following section explores the reliability of WiseWorld’s assessments based on this metric.

Results: Coefficient of Variation (CV) Analysis

Descriptive Statistics and Reliability

The Coefficient of Variation (CV) analysis across participants yielded a mean of 82.63 and a median of 89.00, with most users falling within a moderate variability range of 70–100. This clustering around the mean is a strong indicator of measurement reliability, as it suggests that individual assessments deviate only slightly from the average. This supports WiseWorld’s consistent performance across diverse session scenarios.

Mean CV: 82.63
Median CV: 89.00
Standard Deviation: 24.83
Minimum CV: 26.37
Maximum CV: 112.25

Supporting Evidence of Reliability

Consistency Across Diverse Populations:

Most users - spanning various industries and backgrounds - maintained CVs within the moderate range. This consistency implies that WiseWorld’s AI-driven evaluations perform stably regardless of user differences, a key quality in psychometric assessments.

Comparison to Industry Benchmarks:

In many simulation-based assessments, higher variability is common due to uncontrolled narrative elements or less adaptive feedback systems. WiseWorld’s tighter variability suggests that its scenario design and adaptive mechanisms effectively stabilize outcomes.

Theoretical Support:

Research in assessment theory posits that moderate variability, especially with scores clustering near the mean, reflects a well-calibrated tool. This clustering indicates that scenario challenges are appropriately balanced, so results more accurately reflect accurate skill levels rather than external noise.

Visualizing_Variability_of_Users_Interaction_on_WIseWorld

Left Histogram (66 Users):

The distribution for all users, including those with limited interactions, shows most CVs clustering in the moderate range, visually reinforcing the platform's overall consistency.

Right Histogram (52 Users):

Focusing on users with sufficient data (≥2 episodes) sharpens this picture. The histogram excludes those with fewer interactions, providing a clearer view of assessment reliability among active users.

Both histograms demonstrate that the majority of users exhibit moderate variability (CV: 70-100), while a minority of high-variability outliers (CV > 100) highlight areas for scenario refinement and adaptive difficulty adjustments.

The evidence strongly supports that WiseWorld delivers reliable, consistent results across diverse participants and episode scenarios by combining statistical clustering, theoretical underpinnings, and visual data from the histograms. The moderate CV range confirms that the platform effectively balances sensitivity to individual differences with overall stability, reinforcing its utility as a robust soft skills assessment tool.

Outlier Analysis

Approximately 15.4% of users displayed high variability (CV > 100), suggesting inconsistent decision-making or limited familiarity with the narrative roleplay-inspired format. In response, one of the updates implemented in WiseWorld version V1 moved from a single-narrator model to multiple intelligent NPC interactions, each with distinct personalities and adaptive behaviors. This shift reduces scenario ambiguity, allows for a broader range of user actions, and stabilizes the assessment process, ensuring a more accurate capture of typical and outlier user profiles.

User Segmentation

Participants were categorized into three groups based on CV values:

Low Variability (<70): 12 users (23.1%)
Moderate Variability (70-100): 32 users (61.5%)
High Variability (>100): 8 users (15.4%)

Low-variability users demonstrated stable and reliable results, while moderate-variability users validated WiseWorld’s ability to accommodate diverse behaviors. High-variability users highlighted areas for refinement.

Discussion

User Engagement Metrics

The results highlight WiseWorld’s potential as a practical, scalable tool for organizations focused on soft skills. High engagement metrics indicate strong user interaction, which supports more accurate skill assessments. Planned enhancements - like adaptive session scenarios and dynamic dialogues - will further strengthen WiseWorld’s capacity to deliver meaningful, data-driven insights for diverse groups.

Session duration: each session lasted approximately 7 minutes, aligning with industry standards for effective engagement.
Daily Engagement: Users averaged 19 minutes of playtime daily, completing about three daily sessions.
Total Playtime: Participants who finished all intended challenges averaged about 63 minutes of cumulative playtime.

Gamified assessments may favor users already familiar with gaming environments (Altomari et al., 2023). To mitigate this, WiseWorld includes a structured onboarding process that orients newcomers to its interactive elements, helping ensure fair assessment for all.

Addressing Potential Biases

Participants from tech-centric industries may be more familiar with AI tools, potentially influencing results. Tailored onboarding processes can mitigate these biases, ensuring equitable user experiences across diverse backgrounds.

Understanding these potential biases provides essential context for placing WiseWorld’s engagement metrics within industry standards.

Comparisons to Benchmarks and Engagement Analysis

WiseWorld’s Performance Against Industry Standards:

Research in gamified learning environments typically reports average session durations ranging from 6 to 8 minutes (Hamari, Koivisto, & Sarsa, 2014). In contrast, WiseWorld records an average daily playtime of 19 minutes. This demonstrates high immersion and sustained value across multiple episodes, showcasing WiseWorld's ability to consistently engage users at a level that surpasses industry norms. Given that users complete approximately 2.8 episodes per day - each lasting around 7 minutes - this elevated daily engagement is notably higher than industry norms.

This substantial difference implies that users are more immersed and consistently engaged with WiseWorld’s interactive, story-based episodes. Compared to traditional assessments, WiseWorld’s gamified approach offers a unique combination of engagement and reliability. Altomari et al. (2023) provide insights into the feasibility of serious games for soft skills assessment, offering a point of comparison for WiseWorld’s approach. WiseWorld addresses some limitations identified in similar tools by incorporating dynamic NPC interactions and adaptive session scenarios.

Enhanced Engagement Metrics:

WiseWorld further distinguishes itself with robust daily engagement statistics:

Average Daily Playtime: 19 minutes
Day-1 (D1) Retention Rate: 63%
Day-3 (D3) Retention Rate: 41%

These metrics indicate that users consistently return to the platform and dedicate significant time to each session. As gamification literature highlights, high retention rates, and sustained daily usage are critical indicators of effective engagement strategies (Deterding, Dixon, Khaled, & Nacke, 2011; Hamari et al., 2014). For clarity, the D1 retention rate refers to the percentage of users returning to the platform one day after their initial session. In contrast, the D3 retention rate measures the percentage of users returning three days after registration.

The platform’s engagement metrics, including an average daily playtime of 19 minutes and high retention rates, demonstrate its ability to captivate users effectively. This aligns with research showing that gamified solutions can significantly enhance user engagement while providing rich datasets for skill evaluation (AC Lion, 2023).

Implications for Reliability and Assessment:

The extended session durations and frequent daily interactions contribute to deeper user engagement, which enhances the quality and reliability of soft skills assessments. Longer and more consistent interactions provide a richer dataset, allowing WiseWorld’s AI to observe a broader range of user behaviors across multiple episode scenarios. This comprehensive data collection reduces the impact of outliers and random fluctuations, strengthening the platform’s assessment of human-centric abilities.

Future Exploration:

Further analysis could investigate the relationship between Session duration and score consistency. Understanding how sustained engagement correlates with assessment reliability can provide insights into optimizing episode length and content, potentially leading to even more compelling user experiences and robust data collection.

Beyond metrics and comparisons, the user experience is crucial in engagement and assessment accuracy. The following section evaluates the UX enhancements made to WiseWorld.

UX Evaluation

WiseWorld’s design, inspired by role-playing frameworks like Dungeons & Dragons (D&D), Journey and Gamification Method:

WiseWorld’s design draws inspiration from role-playing frameworks like Dungeons & Dragons (D&D). It utilizes a gamified approach that immerses users in an interactive, decision-based journey. WiseWorld evaluates soft skills and transforms the assessment process into an engaging narrative adventure by creating episode scenarios that require deep engagement and thoughtful decision-making. This journey-based gamification method leverages familiar role-playing elements - such as character creation, story progression, and strategic choices - to make the process enjoyable while gathering meaningful data on user behavior and skill application.

Observations and Limitations:

The platform employs a single AI narrator to guide users through episode scenarios and collect their responses in the current iteration. Data revealed that over 60% of users reached the chat interaction limit with this AI narrator, indicating high engagement but also highlighting a potential bottleneck in interaction complexity and realism. This limitation suggests that a single-narrator model may not fully capture the nuances of dynamic human conversation or provide sufficient variability in feedback.

New Approach to Enhance UX:

To address these limitations and further enhance user experience, the updated version of WiseWorld introduces dynamic dialogues with up to three smart Non-Player Characters (NPCs), each possessing distinct attitudes and personalities. This evolution from a single narrator to multiple NPCs aims to:

Simulate Real-World Interactions: By engaging with multiple characters, users experience a more complex and realistic social environment, mirroring real-life conversations and decision-making contexts.
Enrich Narrative Complexity: The presence of multiple NPCs allows for branching dialogues and more varied episode scenarios, increasing the depth and adaptability of each interaction.
Improve Assessment Precision: A more nuanced interaction model allows the AI to better evaluate soft skills across different conversation dynamics, leading to richer data and more accurate assessments.

Impact on User Experience and Assessment:

While the current study validates the reliability and practicality of the single-narrator version, these updates are designed to:

Sustain user immersion: Offering varied NPC interactions enhances meaningful participation and reduces the likelihood of repetitive dialogue experiences.
Increase Accuracy: Simulating a broader range of conversational styles and episode scenarios provides a more comprehensive assessment environment.
Elevate Gamified Journey: The enhanced NPC framework deepens the narrative experience, making the journey more immersive and reflective of real-world social dynamics.

By evolving the user experience from a singular narrative path to a multi-character, dynamic dialogue system, WiseWorld reinforces its commitment to engaging gamification and rigorous soft skills assessment. This new approach addresses previous limitations and sets the stage for more effective and realistic evaluations, ultimately benefiting users and organizations.

Conclusion

This study confirms that WiseWorld is a reliable and engaging tool for soft skills assessment, demonstrating moderate score variability and robust user engagement metrics. By refining its session scenarios and transitioning to multi-NPC dialogues, WiseWorld enhances assessment precision and user experience. As the demand for soft skills grows in an AI-driven future, WiseWorld bridges the gap between skill development and workforce readiness, offering a scalable solution for organizations worldwide. Future iterations of WiseWorld can further revolutionize soft skills development, setting a new standard for workforce preparedness in an AI-driven era.

Abstract

Introduction

Sample Data

User Journey in WiseWorld

Feedback Mechanism in WiseWorld

Methodology

Research Design

Why CV?

Key advantages include:

Dimensionless Nature: The CV, a standardized measure, ensures fair comparisons across datasets of varying scales, making it ideal for assessing consistency.

Sensitivity to Data Distribution: CV effectively captures relative dispersion even when data distributions deviate from normality, which is common in user-driven platforms like WiseWorld.

By employing CV, the study ensures that reliability assessments remain consistent and comparable across various episode scenarios, providing a precise, objective measure of the platform's performance.

Data Collection Tools

Results: Coefficient of Variation (CV) Analysis

Descriptive Statistics and Reliability

Supporting Evidence of Reliability

Consistency Across Diverse Populations:

Comparison to Industry Benchmarks:

Theoretical Support:

Left Histogram (66 Users):

The distribution for all users, including those with limited interactions, shows most CVs clustering in the moderate range, visually reinforcing the platform's overall consistency.

Right Histogram (52 Users):

Focusing on users with sufficient data (≥2 episodes) sharpens this picture. The histogram excludes those with fewer interactions, providing a clearer view of assessment reliability among active users.

Discussion

User Engagement Metrics

Session duration: each session lasted approximately 7 minutes, aligning with industry standards for effective engagement.

Daily Engagement: Users averaged 19 minutes of playtime daily, completing about three daily sessions.

Total Playtime: Participants who finished all intended challenges averaged about 63 minutes of cumulative playtime.

Addressing Potential Biases

Comparisons to Benchmarks and Engagement Analysis

WiseWorld’s Performance Against Industry Standards:

Enhanced Engagement Metrics:

WiseWorld further distinguishes itself with robust daily engagement statistics:

Average Daily Playtime: 19 minutes

Day-1 (D1) Retention Rate: 63%

Day-3 (D3) Retention Rate: 41%

Implications for Reliability and Assessment:

Future Exploration:

UX Evaluation

Conclusion

The future of talent is moving fast. Keep up in 5 minutes.