Skip to content

allow using categorical data for color, marker, size, … #6214

Open
@flying-sheep

Description

@flying-sheep

currently specifying an array for color or size, only works for continuous data. using markers and colors mapped to subsets based on categorical data is hard and manual (basically you subset the data in a loop and call plot multiple times).

i thought it would be hard as list-of-string already means something for some parameters (markers and colors can be specified as strings). but there is pandas’ category dtype.

i propose that using a categorical vector for one of these parameters should be automatically mapped to a palette of colors/markers when specified as c or marker.

# with A, B, C, being either categorical or quantitative,
# S being quantitative, and M being categorical
plt.scatter('A', 'B', s='S', c='C', marker='M', data=df)

# of course also works with arrays, and bool acts like categorical data:
plt.scatter(arr[:,0], arr[:,1], c=arr[:,-1] > 0)

more palettes than just the colormap

the color palette for categorical data would be plt.rcParams['axes.prop_cycle'].by_key()['color'], the marker palette a new rcParam (maybe?)

those palettes would be cycled if there are too many categories.

legends

  • categorical: legend
  • discrete: (ordered legend or segmented bar)
  • continuous: bar (colorbar for color, isosceles trapezoid for size)

open questions

  • are integers treated as quantitiative + discrete (legend containing all separate values) or continuous (bar)
  • how to specify if a legend/colorbar/sizebar should created? it makes sense to automatically do it if we use the data= interface

Metadata

Metadata

Assignees

No one assigned

    Labels

    Difficulty: Hardhttps://matplotlib.org/devdocs/devel/contribute.html#good-first-issuesNew featurekeepItems to be ignored by the “Stale” Github Actiontopic: categorical

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions