Description
currently specifying an array for color or size, only works for continuous data. using markers and colors mapped to subsets based on categorical data is hard and manual (basically you subset the data in a loop and call plot multiple times).
i thought it would be hard as list-of-string already means something for some parameters (markers and colors can be specified as strings). but there is pandas’ category dtype.
i propose that using a categorical vector for one of these parameters should be automatically mapped to a palette of colors/markers when specified as c
or marker
.
# with A, B, C, being either categorical or quantitative,
# S being quantitative, and M being categorical
plt.scatter('A', 'B', s='S', c='C', marker='M', data=df)
# of course also works with arrays, and bool acts like categorical data:
plt.scatter(arr[:,0], arr[:,1], c=arr[:,-1] > 0)
more palettes than just the colormap
the color palette for categorical data would be plt.rcParams['axes.prop_cycle'].by_key()['color']
, the marker palette a new rcParam
(maybe?)
those palettes would be cycled if there are too many categories.
legends
- categorical: legend
- discrete: (ordered legend or segmented bar)
- continuous: bar (colorbar for color, isosceles trapezoid for size)
open questions
- are integers treated as quantitiative + discrete (legend containing all separate values) or continuous (bar)
- how to specify if a legend/colorbar/sizebar should created? it makes sense to automatically do it if we use the
data=
interface