How can I convert a list of dictionaries into a DataFrame?
An answer to this question on Stack Overflow.
Question
I have a list of dictionaries with a format similar to the following. The list is generated by other functions which I don't want to change. Therefore, the existance of the list and its dicts can be taken as a given.
dictlist=[]
for i in 1:20
push!(dictlist, Dict(:a=>i, :b=>2*i))
end
Is there a syntactically clean way of converting this list into a DataFrame?
Answer
This function provides one possible solution:
using DataFrames
function DictionariesToDataFrame(dictlist)
ret = Dict() #Holds dataframe's columns while we build it
#Get all unique keys from dictlist and make them entries in ret
for x in unique([y for x in [collect(keys(x)) for x in dictlist] for y in x])
ret[x] = []
end
for row in dictlist #Loop through each row
for (key,value) in ret #Use ret to check all possible keys in row
if haskey(row,key) #Is key present in row?
push!(value, row[key]) #Yes
else #Nope
push!(value, nothing) #So add nothing. Keeps columns same length.
end
end
end
#Fix the data types of the columns
for (k,v) in ret #Consider each column
row_type = unique([typeof(x) for x in v]) #Get datatypes of each row
if length(row_type)==1 #All rows had same datatype
row_type = row_type[1] #Fetch datatype
ret[k] = convert(Array{row_type,1}, v) #Convert column to that type
end
end
#DataFrame is ready to go!
return DataFrames.DataFrame(ret)
end
#Generate some data
dictlist=[]
for i in 1:20
push!(dictlist, Dict("a"=>i, "b"=>2*i))
if i>10
dictlist[end-1]["c"]=3*i
end
end
DictionariesToDataFrame(dictlist)