The socialroulette R package is a lightweight package for handling the recurrent problem of assigning individuals into groups of a fixed size. This happens, e.g., as part of mystery lunch scheduling or when assigning students into breakout rooms in video lectures. The work on the package was inspired by the blog post Long time, no see: Virtual Lunch Roulette,

The Problem

The aim is to partition \(n\) participants into groups of size at least \(m\). We shall denote the resulting groupings a partition of the set of participants. If \(m\) is not a divisor of \(n\) then some of the groups have to contain more than \(m\) participants. As an example consider the scenario that 5 individuals have to be divided into groups of size at least 2. We shall adopt the convention, that group size shall be as close to \(m\) as possible and the group sizes should be as equal as possible. In the specific toy example this means that we will need 2 groups, one with 3 participants and one with 2 participants.

Remark: Partitions with \(m=2\) are also known as a matchings and can be handled particularly efficient using special-purpose graph algorithms. However, we use the \(m=2\) example in order to illustrate the package functionality with a simple to follow example. Later sections of the vignette contain some more realistic examples with \(m>2\).

An R package for social roulette

We begin by loading relevant tidyverse packages and the socialroulette package.

We then generate a simple frame of 5 participants and use simple random sampling to assign them into groups:

today <- Sys.Date()
frame <- tibble::tibble( id=sprintf("id%.02d",1:5), date=today)
round1 <- socialroulette::rsocialroulette(current_frame = frame, m=2, algorithm="srs")
#> Partitioning 5 individuals into groups of at least 2 (no past partitions).
#> Created 2 groups of sizes 3 2.
#> [[1]]
#> [1] "id01" "id03" "id05"
#> [[2]]
#> [1] "id02" "id04"

An extension of this grouping problem occurs if we keep track of how participants were partitioned into groups the last time. It can be shown, that simple random sampling leads to quite high reunion probabilities. As an example: the probability to end up in the same group with at least one participant from last week when 50 individuals are partitioned into groups of \(m=4\) is about 20%. In order to improve on this, one can either use rejection sampling or a solver of the maximally diverse grouping problem.

#List of past partitions
past_partitions <- list(round1) %>% setNames(today)
#Same individuals one week later
frame2 <- frame %>% mutate(date = today+7)
#Check for each possible pair in frame2 how long ago (in days) it would be that they met. 
pairs_dist <- socialroulette::partitions_to_distance(frame2, past_partitions)
#> # A tibble: 10 x 4
#>    id1   id2   date        dist
#>    <chr> <chr> <date>     <dbl>
#>  1 id01  id02  2021-05-11    14
#>  2 id01  id03  2021-05-11     7
#>  3 id01  id04  2021-05-11    14
#>  4 id01  id05  2021-05-11     7
#>  5 id02  id03  2021-05-11    14
#>  6 id02  id04  2021-05-11     7
#>  7 id02  id05  2021-05-11    14
#>  8 id03  id04  2021-05-11    14
#>  9 id03  id05  2021-05-11     7
#> 10 id04  id05  2021-05-11    14

For comparison, we now use maximally diverse grouping problem solver on the same toy example. This corresponds to taking a sledgehammer to crush a nut, but helps to show how the package works:

round2 <- socialroulette::rsocialroulette(current_frame = frame2, past_partitions=past_partitions, m=2, algorithm="mdgp", time_limit=1)
#> [[1]]
#> [1] "id01" "id02" "id05"
#> [[2]]
#> [1] "id03" "id04"

Using this partitions we get the following re-unions, i.e. pairs of individuals which in round 2 meet again, despite already being in the same group in round 1. Note that it is easy to show that with 5 participants and \(m=2\), at least one re-union is needed.

partitions <- list(list(last_week = round1), list(this_week = round2))
pairs <- map(partitions, ~ socialroulette::partitions_to_pairs(.x))
inner_join(pairs[[1]], pairs[[2]], by=c("id1", "id2"))
#> # A tibble: 1 x 4
#>   date.x    id1   id2   date.y   
#>   <chr>     <chr> <chr> <chr>    
#> 1 last_week id01  id05  this_week

The total maximized distance (49) found for the selected partition can be confirmed as follows:

list(round2) %>% socialroulette::partitions_to_pairs() %>% 
  left_join(pairs_dist, by=c("id1","id2")) %>% 
  summarise(total_dist = sum(dist))
#> # A tibble: 1 x 1
#>   total_dist
#>        <dbl>
#> 1         49

Assigning Students into Breakout Rooms

We continue with a slightly more advanced example, where we have a class with 100 students, which for a weekly virtual lab exercise class need to be divided into groups of at least 4. Since for various reasons not all students show up to each class, the sampling frame of individuals to be partitioned each week changes accordingly. Still, we would like to make the partitioning of the current week s.t. students get as many new acquaintances as possible.

Create a history of previous partitions as well as the current frame from a population of 100 individuals participating in the social roulette.

# Class of 100 students with 4 previous lectures
students <- tibble::tibble(id=sprintf("", 1:100))
partition_dates <- seq(as.Date("2021-03-31"), length.out=4, by="1 week")

# Simulate changing participation each week for the last 4 weeks (70% attendance)
frames <- map_df( partition_dates, ~ 
            students %>% slice_sample(n = rbinom(1,nrow(.), prob=0.7)) %>% mutate(date=.x))

# Generate some past partitions using simple random sampling
past_partitions <- frames %>% 
   group_split(date) %>%
   map(~rsocialroulette(current_frame=.x, m=4, algorithm="srs")) %>%
#> Partitioning 61 individuals into groups of at least 4 (no past partitions).
#> Created 15 groups of sizes 5 4 4 4 4 4 4 4 4 4 4 4 4 4 4.
#> Partitioning 72 individuals into groups of at least 4 (no past partitions).
#> Created 18 groups of sizes 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4.
#> Partitioning 70 individuals into groups of at least 4 (no past partitions).
#> Created 17 groups of sizes 5 5 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4.
#> Partitioning 76 individuals into groups of at least 4 (no past partitions).
#> Created 19 groups of sizes 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4.

For a more realist scenario, we pretend that each of the above previous partitions has been saved as a .csv file. For example like:

# Simulate the storage of each partition as a .csv file to disk
# with 3 columns: date, id1 and id2, i.e. all pairs
temp_dir <- tempdir() #adjust path to your setting if needed
socialroulette::partitions_to_pairs( past_partitions ) %>% 
  group_split(date) %>%
  walk(~  write_csv(x=.x, file=file.path(temp_dir, stringr::str_c("socialroulette-", .$date[1], ".csv"))))

We thus read the partitions from disk and convert from the stored pair-format (i.e. a data.frame listing each pair being in the same group as id1, id2 together with the corresponding date of the partition) back to the list-format (i.e. a list of character vectors, where each vector denotes a group and the vector contains the ids of all members of that group). This can be done as follows:

# Read again from file
pairs <- map_df(list.files(path=temp_dir, pattern="socialroulette.*", full.names=TRUE), ~read_csv(file=.x))

A small sanity check to ensure that the write and read did not really change any information:

# Convert into a pairs data.frame 
past_partitions2 <- pairs %>% socialroulette::pairs_to_partitions()
# Sanity check
all.equal(past_partitions, past_partitions2)  
#> [1] TRUE

As a next step we sample the students who are in the next class.

current_frame <-  students %>% slice_sample(n = rbinom(1,nrow(.), prob=0.7)) %>%
  mutate(date=max(partition_dates) + diff(partition_dates) %>% mean())

Our goal is now to partition the students 69 in current_frame. For each of the 2346 possbile pairs of students in that class, we determine how long ago it has been, since they were in the same group the last time. This can be done using the internal package function partitions_to_distance:

dist <- socialroulette::partitions_to_distance(current_frame, past_partitions)
dist %>% head()
#> # A tibble: 6 x 4
#>   id1                 id2                 date        dist
#>   <chr>               <chr>               <date>     <dbl>
#> 1 2021-04-28    35
#> 2 2021-04-28    35
#> 3 2021-04-28    35
#> 4 2021-04-28    35
#> 5 2021-04-28    28
#> 6 2021-04-28    21

Since the past partitions contain the last 4 weeks, students who have not been in a group so far are assigned a value of one time unit more than the last possible meeting opportunity. In the specific example this corresponds to \(7\cdot (4 + 1) = 35\) days. It thus seems natural to choose the groups, s.t. they do not contain pairs, which have already met in the past. However, for given size of the population, group sizes and meeting histories, this might not be possible to achieve altogether, so a more flexible criterion is to maximize the sum of distance since the last meet over all pairs in the same group of the partition. This problem is also called the maximally diverse grouping problem (MDGP) in operations research.

# Make the partition using a mdgp solver
partition <-  rsocialroulette(current_frame, past_partitions, m=4, algorithm="mdgp")

The partition can be visualized using the igraph package:

Of course the partition can also be stored to file as before, in order to include it in the set of past partitions when doing the partitioning next week:

list(partition) %>% 
  setNames(current_frame %>% slice(1) %>% pull(date)) %>%
  socialroulette::partitions_to_pairs() %>% 
  write_csv(file=file.path(temp_dir, stringr::str_c("socialroulette-", .$date[1],".csv")))

or can be stored in a Zoom compatible breakout room specification format:

partition %>% 
  socialroulette::partition_to_frame() %>% 
  rename(email=id) %>% 
  mutate(room = sprintf("room%.03d",group)) %>% 
  select(room, email) %>% 
  write_csv(file=file.path(temp_dir, stringr::str_c("zoom-breakoutrooms.csv")))

Virtual Coffee Roulette

For fully-remote companies, virtual coffee rounds can be a good way to bring together employees across different divisions in an informal way. One can use the proposed rsocialroulette sampling to generate a weekly partition and then use any of the mail sending packages in R to subsequently mail the participants of each group that they have been assigned to a common coffee group and should schedule their own virtual meeting during the week.