Skip to content

Feature: Plot multiple bars with one call #11048

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

UTSC-SISYPHUS
Copy link

@UTSC-SISYPHUS UTSC-SISYPHUS commented Apr 13, 2018

PR Summary

Added the ability to give a 2D matrix to pyplot.bar to plot multiple bar graphs with one call.

Example :

x = [1, 2, 3]
heights = [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]
plt.bar(x, heights)

will create
image

#10610

PR Checklist

  • Has Pytest style unit tests
  • Code is PEP 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)

@ImportanceOfBeingErnest
Copy link
Member

I'm a bit confused by this taking the data row-wise.
The plot command accepts the data in columns, i.e. for plot(x,y) you need to have the first dimension of y to match the length of x.
Also for creating multiple histograms, the data for each histogram is in a single column of the array.
Finally, pandas bar plot function also uses the data column-wise.

So for consistency I would suggest to do the same here as well.

@UTSC-SISYPHUS
Copy link
Author

I did it the way it is because I think most users will have their data for each bar in an series of arrays and would like to just put them into one bar call. The post referenced in the feature request (https://stackoverflow.com/questions/14270391/python-matplotlib-multiple-bars?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa) does it that way so thats what I went with. I can change it if that's what you want but I'd like to be sure that others agree with you about it first.

@jklymak
Copy link
Member

jklymak commented Apr 14, 2018

I'd need to double check, but I somewhat suspect you broke the API for colors etc as well. Or at the very least, you need to document the changes.

@UTSC-SISYPHUS
Copy link
Author

In terms of plotting just a single bar graph, nothing is changed. I could, however put in new documentation describing how optional parameters are treated by the multibar code.

@anntzer
Copy link
Contributor

anntzer commented Apr 14, 2018

I think there's a pretty bad issue right now of rows vs columns in this kind of data (see e.g. #8092) and would suggest we figure out the best way to handle that first (my opinion is in the linked issue, but mostly I just want things to be consistent :-)) before making the situation even more tangled...

@ImportanceOfBeingErnest
Copy link
Member

A cultural standard is to organize data in tables. Those tables are read column by column.

Example (from the famous work by Jean Baptiste Perrin about Brownian Motion)

image

While of couse equally possible, I rarely see anyone creating such table by aranging the independent data horizontally.
Any modern computer tool I know of that is used for data storage or analysis uses tables in such way, including Database systems, Spreadsheet software, or any specialized tool I have been working with so far.

Example from Spreadsheet software:

image

More specifically, reading in data into numpy from such table with n rows and m columns, e.g. via loadtxt, creates a numpy array of shape (n,m), so we get the data of the each table column along the first dimension of the numpy array. Equally pandas uses the concept of columns containing data from a series.

To be honest I'm at loss as to why one would suddenly use the transpose of that.

(I know that matplotlib already has this transposed concept e.g. in the case of asymmetrical errorbars, where data needs to be row-wise, but that is surely confusing.)

@UTSC-SISYPHUS
Copy link
Author

Ok. I made it take the data column-wise.

The height(s) of the bars.
2-dimensional arrays represent grouped bars.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be

scalar or array-like of shape(N,) or shape(N,M)

right? (same below for barh)
And the role of N and M should be explained here. Something like

..., where N is the number of bar positions (same as the length of x) and M the number of groups.

@ImportanceOfBeingErnest
Copy link
Member

Data Structure

I think this is good. It is consistent with how plot and hist etc. handle 2D numpy arrays; and as outlined above this is in my optinion the correct way anyways. I do see the problem mentionned by @anntzer and discussed in #8092 for boxplots. I would argue that in the case of bar plots, there is no need to even support unequal length datasets (since you never acually want to leave out the last bars, and hence need to use np.nan for missing bars anyways, leading to nice 2D arrays.)

Documentation

A minor point (but not negligible) is the example:
There currently is a bar chart example in the docs. This shows how to produce grouped bars (in the old fashionned way).
I see two options:

  • Extend the existing example, i.e. Below it, write something like: "The above can now also be obtained by a single call to bar using a two-dimensional dataset for the heights with one column for men's data and another for women's data." And then show how to produce the exact same plot with this new feature.
  • Link the old example to the new: "Note that grouped bar charts can now also be achieved through other means, see the (link:)grouped bar example". Then also link the new example to the old, e.g. "This now makes the creation of grouped bar chars as the one from the (link:) barchart example even easier."

(I think I slightly prefer the first option, simply because it keeps the number of different gallery examples low. But maybe it's also good to show an example with more than two groups?)

Finally, I think this grouping mostly makes sense for categorical data. Since matplotlib supports categorical data in the sense of strings supplied to plotting function, wouldn't it be nice to have this incoorporated in the example? So using ax.bar(labels, heights) instead of ax.bar(x, heights, label=labels).

@tacaswell
Copy link
Member

While I understand the motivation, I have several concerns about this this.

If this goes in I suspect we will almost immediately get a request for this, but stacked. The obvious way to add this is to add some grouped-only relevant kwargs which is something we diffidently want to avoid.

I am also concerned about the change in return type based on the type of the input.

The isinstance checks are troubling, we try very hard to avoid those. Is this always the broadcast semantic that we want? I think this will lead to a case where you can specify height column-wise, but row-wise for everyhing else.

How does this play with datastructures that are easy to get out of pandas dataframes?

I think this may be better as a new top-level function to avoid the variable return types, easily support the stacked case, and to (without concern for back compatibility) deal with the broadcasting of all the parameters issues.

@ImportanceOfBeingErnest
Copy link
Member

How does this play with datastructures that are easy to get out of pandas dataframes?

With the function now taking the data in columns, this plays very well with pandas. As stated above this is one of the motivations to use columns instead of rows. If df is a dataframe,

plt.bar(df.index, df.values)

It seems though that it does not allow for the data argument to be provided. I guess this could be added, and take array-like arguments, like plt.bar("x", ["y1", "y2"], colors=["colorcolumn1","colorcolumn2"], data=df); is that what is meant here?

I cannot say much about the other points - probably they are all valid.

@UTSC-SISYPHUS
Copy link
Author

If you guys could give me a checklist with what you want me to do I would be happy to oblige.

@tacaswell
Copy link
Member

We talked about this on the weekly call and the consensus is that this is a very useful functionality, but the front end complexity people are going to want is quite large and going to rapidly grow (give a mouse a cookie....) and adding a large branch that recursively calls back into it's self is too much complexity on the implementation side.

The route we would like to see this take is to create a new function (multi_bar or something, (I'm really bad at names)) that is only for multi bar (both stacked and adjacent). As mentioned above this gets around almost all of the back-compatibility issues and shoe-horning the multi-bar API into the existing bar API. A good measure if the API is flexible enough is if we can replace the multi-dataset plotting in hist with a call to this function.

What kwargs from bar do we want to carry over? bottom and align seem like candidates of things to drop, what else can go?

What are the desired semantics on:

  • the data shape (I think this one is pretty pinned down as column-wise)
  • how kwargs/style are broadcast. Setting one style per column makes sense, but would you ever want to broadcast by row? If so how would you differentiate user intent (particular for square imputs!)
  • how to handle the data kwarg through everything? (I like @ImportanceOfBeingErnest 's suggestion above)

Is it worth making a new Artist class for this so that users can change from stacked to adjacent interactively?

I think the next step is to write out the docstring for what this function!

Copy link
Member

@tacaswell tacaswell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments for details

tldr; make this a separate function

@jklymak
Copy link
Member

jklymak commented May 9, 2018

Still thinking of working on this one @UTSC-SISYPHUS ?

@UTSC-SISYPHUS
Copy link
Author

Sorry, I had exams. I will still work on it. I'll get to it this weekend. I could do with a bit of information on the dataframe @ImportanceOfBeingErnest mentioned for the puprpose of supporting the data kwarg though.

@UTSC-SISYPHUS
Copy link
Author

Added documentation for new function. Please give me feedback.

@ImportanceOfBeingErnest
Copy link
Member

It looks like this currently only consists of documentation semantics. The code is simply pass.
Not sure what the status is here.

@UTSC-SISYPHUS
Copy link
Author

I had this working in another function but changes were requested. I'm asking if I should write the code for the documentation as it is now or if I should make some more changes first.

@github-actions
Copy link

github-actions bot commented May 4, 2023

Since this Pull Request has not been updated in 60 days, it has been marked "inactive." This does not mean that it will be closed, though it may be moved to a "Draft" state. This helps maintainers prioritize their reviewing efforts. You can pick the PR back up anytime - please ping us if you need a review or guidance to move the PR forward! If you do not plan on continuing the work, please let us know so that we can either find someone to take the PR over, or close it.

@github-actions github-actions bot added the status: inactive Marked by the “Stale” Github Action label May 4, 2023
@timhoffm
Copy link
Member

timhoffm commented May 4, 2023

Superseded by #24313. Please move further discussion there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants