Skip to content

How can I convert a list of dictionaries into a DataFrame?

An answer to this question on Stack Overflow.

Question

I have a list of dictionaries with a format similar to the following. The list is generated by other functions which I don't want to change. Therefore, the existance of the list and its dicts can be taken as a given.

dictlist=[]
for i in 1:20
  push!(dictlist, Dict(:a=>i, :b=>2*i))
end

Is there a syntactically clean way of converting this list into a DataFrame?

Answer

This function provides one possible solution:

using DataFrames
function DictionariesToDataFrame(dictlist)
  ret = Dict()                 #Holds dataframe's columns while we build it
  #Get all unique keys from dictlist and make them entries in ret
  for x in unique([y for x in [collect(keys(x)) for x in dictlist] for y in x])
    ret[x] = []
  end
  for row in dictlist          #Loop through each row
    for (key,value) in ret     #Use ret to check all possible keys in row
      if haskey(row,key)       #Is key present in row?
        push!(value, row[key]) #Yes
      else                     #Nope
        push!(value, nothing)  #So add nothing. Keeps columns same length.
      end
    end
  end
  #Fix the data types of the columns
  for (k,v) in ret                             #Consider each column
    row_type = unique([typeof(x) for x in v])  #Get datatypes of each row
    if length(row_type)==1                     #All rows had same datatype
      row_type = row_type[1]                   #Fetch datatype
      ret[k]   = convert(Array{row_type,1}, v) #Convert column to that type
    end
  end
  #DataFrame is ready to go!
  return DataFrames.DataFrame(ret)
end
#Generate some data
dictlist=[]
for i in 1:20
  push!(dictlist, Dict("a"=>i, "b"=>2*i))
  if i>10
    dictlist[end-1]["c"]=3*i
  end
end
DictionariesToDataFrame(dictlist)