I have a question about data cleaning work in R

If there are multiple elements in one column, I want to move multiple elements to the elements in the column, but I don’t know how.


Please let me know if there is an efficient way to convert multiple elements as one element in the same dataframe column.

Hello @innerhoops1219. Apologies, but I do not understand the question. Using the screenshot provided, can you provide some examples of what you would like the end result to look like? What are you interested in changing exactly? Thanks!!!
-Casey

1 Like

Hello, casey!
Thank you for your comment.

Since the data of multiple elements are mixed in one column,
As a final result, I want to have only one element in each column.

csv.data
Green label [E : J] (Situation)
Yellow label [K : AF] (Number)

Now

E (Breakpoint/sideout) : 2factor [Breakpoint , Sideout]
F (Phase) : 2factor [Reception, Transition]
G (Attacker was passer) : 2factor [TRUE, FALSE]
H (Pass quality) : 4factor [Perfect, Good, OK, Poor]
I (Attack code): 11factor [L7, L8, L9, P1, P5, P9, PA, PP, PU, PV, PZ,]
J (Setter position): 6factor [1,2,3,4,5,6]

K:F (Number)

I would like the end result

ex)
Green label (E: Sideout) + Yellow label K:F (Number)
Green label (E: Breakpoint) + Yellow label K:F (Number)
Green label (F: Reception) + Yellow label K:F (Number)
Green label (F: Transition) + Yellow label K:F (Number)
Green label (G: TRUE) + Yellow label K:F (Number)
Green label (G: FLASE) + Yellow label K:F (Number)
Green label (H: Perfect) + Yellow label K:F (Number)
.
.
.

Sorry for the poor explanation…Book1.xlsx (38.8 KB)

Hello @innerhoops1219. The end result is still unclear to me. But if I understand correctly, I think you may be looking for the functionality of pivot_wider() from the tidyr package. Complete information is provided in the vignette.

Below is the code for two examples: (1) split the column Breakpoint/sideout into separate columns called Sideout and Breakpoint, and (2) split the column Phase into two columns Reception and Transition.

library(readxl)
data <- read_xlsx("Book1.xlsx")

library(dplyr)
library(tidyr)

# Create separate columns called "Breakpoint" and "Sideout"
# The original column "`Breakpoint/sideout`" is removed
data <- data %>% pivot_wider(names_from = `Breakpoint/sideout`, 
                             values_from = `Breakpoint/sideout`)

# Create separate columns called "Reception" and "Transition"
# The original column "Phase" is removed
data <- data %>% pivot_wider(names_from = `Phase`, 
                             values_from = `Phase`)

This function is concise but powerful. Does this provide the desired result you are looking for?
Best,
-Casey

1 Like

hello, Casey!

thank you for contacting.

I’m sorry for my lack of explanation.
It was a little different from my idea, but thanks to you, I got the hint.

As I progress through the R course, I will try to put together my thoughts so that I can explain them clearly.

At that time, I would be happy if I could get an answer again.

thank you for your kindness.

Thanks!
Shinnosuke

1 Like