The homophily phenomenon in social networks causes users to interact primarily with others who share their interests and cultural backgrounds, leading to the formation of "echo chambers" [1–3].
The notion of cultural diversity among users and communities becomes relevant in this context. While previous studies have investigated diversity in interaction graphs, to the best of our knowledge, none have explored the degree of diversity based on community embedding, which has been proven effective in measuring the positioning of communities in various social dimensions [4–7].
Building on the work of [7], we propose characterizing and measuring diversity through an innovative algorithm based on community embedding. We propose a novel algorithm based on community embedding to characterize and measure diversity. Our approach builds upon prior work on diversity in social media and involves iteratively updating values for the diversity of communities and individual users.
To demonstrate the effectiveness of our algorithm, we conduct a case study analyzing over over 800 million posts in 9 million discussion subreddits of different ethnic groups on Reddit. Next, we generated embeddings for each community using community2vec [8] and developed algorithms to quantify cultural diversity based on these embeddings.