Mon dataframe contient les éléments suivants:
1) Chaque user_id peut avoir plusieurs order_id
2) Chaque order_id peut être de 2 types: les commandes mensuelles ont un cycle de 1 mois (30 jours) ou 3 mois (90 jours). Les utilisateurs - au cours de sa vie - peuvent passer d'un plan_type à l'autre plusieurs fois.
df_final <- tibble::tribble(
~order_id, ~behavior_type, ~order_type,
123, "none", "acquisition",
124, "none", "repeat",
125, "upgrade", "repeat",
126, "none", "acquisition",
127, "downgrade", "repeat",
128, "none", "repeat",
129, "none", "acquisition",
130, "none", "repeat",
131, "none", "repeat",
132, "none", "acquisition",
133, "none", "repeat"
)
> df_final
# A tibble: 11 x 3
order_id behavior_type order_type
<dbl> <chr> <chr>
1 123 none acquisition
2 124 none repeat
3 125 upgrade repeat
4 126 none acquisition
5 127 downgrade repeat
6 128 none repeat
7 129 none acquisition
8 130 none repeat
9 131 none repeat
10 132 none acquisition
11 133 none repeat
Je souhaite créer 2 colonnes supplémentaires comme suit:
1) order_type : Je souhaite classer le order_ids regroupement par user_id et date ; après avoir classé la commande par date , je veux les nommer acquisition si rank = 1 (premier ordre), répéter sinon.
J'ai réussi à le faire comme suit:
df_input %>% group_by(user_id) %>% mutate(rank = row_number(date)) %>% mutate(order_type = ifelse(rank == '1','acquisition','repeats'))
2) Je souhaite créer une colonne behavior_type . Je souhaite marquer chaque order_id comme upgrade , downgrade ou none .
Si l'utilisateur passe de mensuel à 3 mois , la commande sera mise à jour , si à partir de 3 mois code > à mensuellement un downgrade , none sinon.
Ma base de données finale devra ressembler à:
library(tidyverse)
df_input <- tibble::tribble(
~user_id, ~order_id, ~date, ~plan_type,
1, 123, "01-01-2020", "monthly",
1, 124, "01-31-2020", "monthly",
1, 125, "03-01-2020", "3-months",
2, 126, "01-11-2019", "3-months",
2, 127, "10-13-2018", "monthly",
2, 128, "11-12-2018", "monthly",
3, 129, "01-10-2019", "3-months",
3, 130, "04-10-2019", "3-months",
3, 131, "07-09-2019", "3-months",
4, 132, "01-02-2020", "monthly",
4, 133, "02-01-2020", "monthly"
)
> df_input
# A tibble: 11 x 4
user_id order_id date plan_type
<dbl> <dbl> <chr> <chr>
1 1 123 01-01-2020 monthly
2 1 124 01-31-2020 monthly
3 1 125 03-01-2020 3-months
4 2 126 01-11-2019 3-months
5 2 127 10-13-2018 monthly
6 2 128 11-12-2018 monthly
7 3 129 01-10-2019 3-months
8 3 130 04-10-2019 3-months
9 3 131 07-09-2019 3-months
10 4 132 01-02-2020 monthly
11 4 133 02-01-2020 monthly
Toute aide pour cette deuxième étape
3 Réponses :
Comme ça?
library(dplyr)
#just repeating your first step
df_interim <- df_input %>%
group_by(user_id) %>%
mutate(rank = row_number(date)) %>%
mutate(order_type = ifelse(rank == '1','acquisition','repeats'))
df_interim %>%
mutate(behavior_type = case_when(plan_type == "monthly" & lag(plan_type) == "3-months" ~ "downgrade",
plan_type == "3-months" & lag(plan_type) == "monthly" ~ "upgrade",
TRUE ~ "none")) %>%
ungroup() %>%
select(order_id, behavior_type, order_type)
# A tibble: 11 x 3
order_id behavior_type order_type
<dbl> <chr> <chr>
1 123 none acquisition
2 124 none repeats
3 125 upgrade repeats
4 126 none acquisition
5 127 downgrade repeats
6 128 none repeats
7 129 none acquisition
8 130 none repeats
9 131 none repeats
10 132 none acquisition
11 133 none repeats
La partie type de commande pourrait être plus simple comme ceci:
group_by(user_id) %>% mutate(order_type = if_else(as.Date(date, "%m-%d-%Y") == min(as.Date(date, "%m-%d-%Y")), "acquisition", "repeat"))
library(dplyr)
df_input %>%
group_by(user_id) %>%
mutate(rank = row_number(date)) %>%
mutate(order_type = if_else(rank == "1","acquisition","repeats"),
behavior_type = if_else(rank == "1", "none",
if_else(plan_type != lag(plan_type) & plan_type == "3-months", "upgrade",
if_else(plan_type != lag(plan_type) & plan_type == "monthly", "downgrade",
"none"))))
# A tibble: 11 x 7
# Groups: user_id [4]
user_id order_id date plan_type rank order_type behavior_type
<dbl> <dbl> <chr> <chr> <int> <chr> <chr>
1 1 123 01-01-2020 monthly 1 acquisition none
2 1 124 01-31-2020 monthly 2 repeats none
3 1 125 03-01-2020 3-months 3 repeats upgrade
4 2 126 01-11-2019 3-months 1 acquisition none
5 2 127 10-13-2018 monthly 2 repeats downgrade
6 2 128 11-12-2018 monthly 3 repeats none
7 3 129 01-10-2019 3-months 1 acquisition none
8 3 130 04-10-2019 3-months 2 repeats none
9 3 131 07-09-2019 3-months 3 repeats none
10 4 132 01-02-2020 monthly 1 acquisition none
11 4 133 02-01-2020 monthly 2 repeats none