tidypolars.tibble_df ==================== .. py:module:: tidypolars.tibble_df Classes ------- .. autoapisummary:: tidypolars.tibble_df.tibble Functions --------- .. autoapisummary:: tidypolars.tibble_df.desc tidypolars.tibble_df.as_tibble tidypolars.tibble_df.is_tibble tidypolars.tibble_df.from_polars tidypolars.tibble_df.from_pandas Module Contents --------------- .. py:class:: tibble(_data=None, **kwargs) Bases: :py:obj:`tidypolars.reexports.pl.DataFrame` A data frame object that provides methods familiar to R tidyverse users. .. py:method:: __dir__() .. py:method:: __repr__() Printing method .. py:method:: _repr_html_() Printing method for jupyter Output rows and columns can be modified by setting the following ENVIRONMENT variables: * POLARS_FMT_MAX_COLS: set the number of columns * POLARS_FMT_MAX_ROWS: set the number of rows .. py:method:: __copy__() .. py:method:: __str__() Printing method .. py:method:: __getattribute__(attr) .. py:method:: __getitem__(col) Get part of the DataFrame as a new DataFrame, Series, or scalar. :param key: Rows / columns to select. This is easiest to explain via example. Suppose we have a DataFrame with columns `'a'`, `'d'`, `'c'`, `'d'`. Here is what various types of `key` would do: - `df[0, 'a']` extracts the first element of column `'a'` and returns a scalar. - `df[0]` extracts the first row and returns a Dataframe. - `df['a']` extracts column `'a'` and returns a Series. - `df[0:2]` extracts the first two rows and returns a Dataframe. - `df[0:2, 'a']` extracts the first two rows from column `'a'` and returns a Series. - `df[0:2, 0]` extracts the first two rows from the first column and returns a Series. - `df[[0, 1], [0, 1, 2]]` extracts the first two rows and the first three columns and returns a Dataframe. - `df[0: 2, ['a', 'c']]` extracts the first two rows from columns `'a'` and `'c'` and returns a Dataframe. - `df[:, 0: 2]` extracts all rows from the first two columns and returns a Dataframe. - `df[:, 'a': 'c']` extracts all rows and all columns positioned between `'a'` and `'c'` *inclusive* and returns a Dataframe. In our example, that would extract columns `'a'`, `'d'`, and `'c'`. :rtype: DataFrame, Series, or scalar, depending on `key`. .. rubric:: Examples >>> df = pl.DataFrame( ... {"a": [1, 2, 3], "d": [4, 5, 6], "c": [1, 3, 2], "b": [7, 8, 9]} ... ) >>> df[0] shape: (1, 4) ┌─────┬─────┬─────┬─────┐ │ a ┆ d ┆ c ┆ b │ │ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ i64 ┆ i64 │ ╞═════╪═════╪═════╪═════╡ │ 1 ┆ 4 ┆ 1 ┆ 7 │ └─────┴─────┴─────┴─────┘ >>> df[0, "a"] 1 >>> df["a"] shape: (3,) Series: 'a' [i64] [ 1 2 3 ] >>> df[0:2] shape: (2, 4) ┌─────┬─────┬─────┬─────┐ │ a ┆ d ┆ c ┆ b │ │ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ i64 ┆ i64 │ ╞═════╪═════╪═════╪═════╡ │ 1 ┆ 4 ┆ 1 ┆ 7 │ │ 2 ┆ 5 ┆ 3 ┆ 8 │ └─────┴─────┴─────┴─────┘ >>> df[0:2, "a"] shape: (2,) Series: 'a' [i64] [ 1 2 ] >>> df[0:2, 0] shape: (2,) Series: 'a' [i64] [ 1 2 ] >>> df[[0, 1], [0, 1, 2]] shape: (2, 3) ┌─────┬─────┬─────┐ │ a ┆ d ┆ c │ │ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ i64 │ ╞═════╪═════╪═════╡ │ 1 ┆ 4 ┆ 1 │ │ 2 ┆ 5 ┆ 3 │ └─────┴─────┴─────┘ >>> df[0:2, ["a", "c"]] shape: (2, 2) ┌─────┬─────┐ │ a ┆ c │ │ --- ┆ --- │ │ i64 ┆ i64 │ ╞═════╪═════╡ │ 1 ┆ 1 │ │ 2 ┆ 3 │ └─────┴─────┘ >>> df[:, 0:2] shape: (3, 2) ┌─────┬─────┐ │ a ┆ d │ │ --- ┆ --- │ │ i64 ┆ i64 │ ╞═════╪═════╡ │ 1 ┆ 4 │ │ 2 ┆ 5 │ │ 3 ┆ 6 │ └─────┴─────┘ >>> df[:, "a":"c"] shape: (3, 3) ┌─────┬─────┬─────┐ │ a ┆ d ┆ c │ │ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ i64 │ ╞═════╪═════╪═════╡ │ 1 ┆ 4 ┆ 1 │ │ 2 ┆ 5 ┆ 3 │ │ 3 ┆ 6 ┆ 2 │ └─────┴─────┴─────┘ .. py:method:: arrange(*args) Arrange/sort rows :param \*args: Columns to sort by :type \*args: str .. rubric:: Examples >>> df = tp.tibble({'x': ['a', 'a', 'b'], 'y': range(3)}) >>> # Arrange in ascending order >>> df.arrange('x', 'y') ... >>> # Arrange some columns descending >>> df.arrange(tp.desc('x'), 'y') .. py:method:: as_dict(*, as_series=True) Aggregate data with summary statistics :param as_series: If True - returns the dict values as Series If False - returns the dict values as lists :type as_series: bool .. rubric:: Examples >>> df.to_dict() >>> df.to_dict(as_series = False) .. py:method:: as_pandas() Convert to a pandas DataFrame .. rubric:: Examples >>> df.as_pandas() .. py:method:: as_polars() Convert to a polars DataFrame .. rubric:: Examples >>> df.as_polars() .. py:method:: bind_cols(*args) Bind data frames by columns :param df: Data frame to bind :type df: tibble .. rubric:: Examples >>> df1 = tp.tibble({'x': ['a', 'a', 'b'], 'y': range(3)}) >>> df2 = tp.tibble({'a': ['c', 'c', 'c'], 'b': range(4, 7)}) >>> df1.bind_cols(df2) .. py:method:: bind_rows(*args) Bind data frames by row :param \*args: Data frames to bind by row :type \*args: tibble, list .. rubric:: Examples >>> df1 = tp.tibble({'x': ['a', 'a', 'b'], 'y': range(3)}) >>> df2 = tp.tibble({'x': ['c', 'c', 'c'], 'y': range(4, 7)}) >>> df1.bind_rows(df2) .. py:method:: clone() Very cheap deep clone .. py:method:: count(*args, sort=False, name='n') Returns row counts of the dataset. If bare column names are provided, count() returns counts by group. :param \*args: Columns to group by :type \*args: str, Expr :param sort: Should columns be ordered in descending order by count :type sort: bool :param name: The name of the new column in the output. If omitted, it will default to "n". :type name: str .. rubric:: Examples >>> df = tp.tibble({'a': range(3), 'b': ['a', 'a', 'b']}) >>> df.count() >>> df.count('b') .. py:method:: distinct(*args) Select distinct/unique rows :param \*args: Columns to find distinct/unique rows :type \*args: str, Expr .. rubric:: Examples >>> df = tp.tibble({'a': range(3), 'b': ['a', 'a', 'b']}) >>> df.distinct() >>> df.distinct('b') .. py:method:: drop(*args) Drop unwanted columns :param \*args: Columns to drop :type \*args: str .. rubric:: Examples >>> df.drop('x', 'y') .. py:method:: drop_null(*args) Drop rows containing missing values :param \*args: Columns to drop nulls from (defaults to all) :type \*args: str .. rubric:: Examples >>> df = tp.tibble(x = [1, None, 3], y = [None, 'b', 'c'], z = range(3)} >>> df.drop_null() >>> df.drop_null('x', 'y') .. py:method:: equals(other, null_equal=True) Check if two tibbles are equal .. py:method:: glimpse() Return a dense preview of the DataFrame. The formatting shows one line per column so that wide dataframes display cleanly. Each line shows the column name, the data type, and the first few values. .. py:method:: fill(*args, direction='down', _by=None) Fill in missing values with previous or next value :param \*args: Columns to fill :type \*args: str :param direction: Direction to fill. One of ['down', 'up', 'downup', 'updown'] :type direction: str :param by: Columns to group by :type by: str, list .. rubric:: Examples >>> df = tp.tibble({'a': [1, None, 3, 4, 5], ... 'b': [None, 2, None, None, 5], ... 'groups': ['a', 'a', 'a', 'b', 'b']}) >>> df.fill('a', 'b') >>> df.fill('a', 'b', by = 'groups') >>> df.fill('a', 'b', direction = 'downup') .. py:method:: filter(*args, _by=None) Filter rows on one or more conditions :param \*args: Conditions to filter by :type \*args: Expr :param by: Columns to group by :type by: str, list .. rubric:: Examples >>> df = tp.tibble({'a': range(3), 'b': ['a', 'a', 'b']}) >>> df.filter(col('a') < 2, col('b') == 'a') >>> df.filter((col('a') < 2) & (col('b') == 'a')) >>> df.filter(col('a') <= tp.mean(col('a')), by = 'b') .. py:method:: full_join(df, left_on=None, right_on=None, on=None, suffix: str = '_right') Perform an full join :param df: Lazy DataFrame to join with. :type df: tibble :param left_on: Join column(s) of the left DataFrame. :type left_on: str, list :param right_on: Join column(s) of the right DataFrame. :type right_on: str, list :param on: Join column(s) of both DataFrames. If set, `left_on` and `right_on` should be None. :type on: str, list :param suffix: Suffix to append to columns with a duplicate name. :type suffix: str .. rubric:: Examples >>> df1.full_join(df2) >>> df1.full_join(df2, on = 'x') >>> df1.full_join(df2, left_on = 'left_x', right_on = 'x') .. py:method:: head(n=5, *, _by=None) Alias for `.slice_head()` .. py:method:: inner_join(df, left_on=None, right_on=None, on=None, suffix='_right') Perform an inner join :param df: Lazy DataFrame to join with. :type df: tibble :param left_on: Join column(s) of the left DataFrame. :type left_on: str, list :param right_on: Join column(s) of the right DataFrame. :type right_on: str, list :param on: Join column(s) of both DataFrames. If set, `left_on` and `right_on` should be None. :type on: str, list :param suffix: Suffix to append to columns with a duplicate name. :type suffix: str .. rubric:: Examples >>> df1.inner_join(df2) >>> df1.inner_join(df2, on = 'x') >>> df1.inner_join(df2, left_on = 'left_x', right_on = 'x') .. py:method:: left_join(df, left_on=None, right_on=None, on=None, suffix='_right') Perform a left join :param df: Lazy DataFrame to join with. :type df: tibble :param left_on: Join column(s) of the left DataFrame. :type left_on: str, list :param right_on: Join column(s) of the right DataFrame. :type right_on: str, list :param on: Join column(s) of both DataFrames. If set, `left_on` and `right_on` should be None. :type on: str, list :param suffix: Suffix to append to columns with a duplicate name. :type suffix: str .. rubric:: Examples >>> df1.left_join(df2) >>> df1.left_join(df2, on = 'x') >>> df1.left_join(df2, left_on = 'left_x', right_on = 'x') .. py:method:: mutate(*args, _by=None, **kwargs) Add or modify columns :param \*args: Column expressions to add or modify :type \*args: Expr :param by: Columns to group by :type by: str, list :param \*\*kwargs: Column expressions to add or modify :type \*\*kwargs: Expr .. rubric:: Examples >>> df = tp.tibble({'a': range(3), 'b': range(3), c = ['a', 'a', 'b']}) >>> df.mutate(double_a = col('a') * 2, ... a_plus_b = col('a') + col('b')) >>> df.mutate(row_num = row_number(), by = 'c') .. py:method:: pivot_longer(cols=everything(), names_to='name', values_to='value') Pivot data from wide to long :param cols: List of the columns to pivot. Defaults to all columns. :type cols: Expr :param names_to: Name of the new "names" column. :type names_to: str :param values_to: Name of the new "values" column :type values_to: str .. rubric:: Examples >>> df = tp.tibble({'id': ['id1', 'id2'], 'a': [1, 2], 'b': [1, 2]}) >>> df.pivot_longer(cols = ['a', 'b']) >>> df.pivot_longer(cols = ['a', 'b'], names_to = 'stuff', values_to = 'things') .. py:method:: pivot_wider(names_from='name', values_from='value', id_cols=None, values_fn='first', values_fill=None) Pivot data from long to wide :param names_from: Column to get the new column names from. :type names_from: str :param values_from: Column to get the new column values from :type values_from: str :param id_cols: A set of columns that uniquely identifies each observation. Defaults to all columns in the data table except for the columns specified in `names_from` and `values_from`. :type id_cols: str, list :param values_fn: Function for how multiple entries per group should be dealt with. Any of 'first', 'count', 'sum', 'max', 'min', 'mean', 'median', 'last' :type values_fn: str :param values_fill: If values are missing/null, what value should be filled in. Can use: "backward", "forward", "mean", "min", "max", "zero", "one" :type values_fill: str .. rubric:: Examples >>> df = tp.tibble({'id': [1, 1], 'variable': ['a', 'b'], 'value': [1, 2]}) >>> df.pivot_wider(names_from = 'variable', values_from = 'value') .. py:method:: print() .. py:method:: pull(var=None) Extract a column as a series :param var: Name of the column to extract. Defaults to the last column. :type var: str .. rubric:: Examples >>> df = tp.tibble({'a': range(3), 'b': range(3)) >>> df.pull('a') .. py:method:: relocate(*args, _before=None, _after=None) Move a column or columns to a new position :param \*args: Columns to move :type \*args: str, Expr .. rubric:: Examples >>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.relocate('a', before = 'c') >>> df.relocate('b', after = 'c') .. py:method:: rename(_mapping=None, **kwargs) Rename columns :param _mapping: Dictionary mapping of new names :type _mapping: dict :param \*\*kwargs: key-value pair of new name from old name :type \*\*kwargs: str .. rubric:: Examples >>> df = tp.tibble({'x': range(3), 't': range(3), 'z': ['a', 'a', 'b']}) >>> df.rename(new_x = 'x') # dplyr interface >>> df.rename({'x': 'new_x'}) # pandas interface .. py:method:: replace_null(replace=None) Replace null values :param replace: Dictionary of column/replacement pairs :type replace: dict .. rubric:: Examples >>> df = tp.tibble(x = [0, None], y = [None, None]) >>> df.replace_null(dict(x = 1, y = 2)) .. py:method:: separate(sep_col, into, sep='_', remove=True) Separate a character column into multiple columns :param sep_col: Column to split into multiple columns :type sep_col: str :param into: List of new column names :type into: list :param sep: Separator to split on. Default to '_' :type sep: str :param remove: If True removes the input column from the output data frame :type remove: bool .. rubric:: Examples >>> df = tp.tibble(x = ['a_a', 'b_b', 'c_c']) >>> df.separate('x', into = ['left', 'right']) .. py:method:: set_names(nm=None) Change the column names of the data frame :param nm: A list of new names for the data frame :type nm: list .. rubric:: Examples >>> df = tp.tibble(x = range(3), y = range(3)) >>> df.set_names(['a', 'b']) .. py:method:: select(*args) Select or drop columns :param \*args: Columns to select :type \*args: str, Expr .. rubric:: Examples >>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.select('a', 'b') >>> df.select(col('a'), col('b')) .. py:method:: slice(*args, _by=None) Grab rows from a data frame :param \*args: Rows to grab :type \*args: int, list :param by: Columns to group by :type by: str, list .. rubric:: Examples >>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.slice(0, 1) >>> df.slice(0, by = 'c') .. py:method:: slice_head(n=5, *, _by=None) Grab top rows from a data frame :param n: Number of rows to grab :type n: int :param by: Columns to group by :type by: str, list .. rubric:: Examples >>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.slice_head(2) >>> df.slice_head(1, by = 'c') .. py:method:: slice_tail(n=5, *, _by=None) Grab bottom rows from a data frame :param n: Number of rows to grab :type n: int :param by: Columns to group by :type by: str, list .. rubric:: Examples >>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.slice_tail(2) >>> df.slice_tail(1, by = 'c') .. py:method:: summarise(*args, _by=None, **kwargs) Alias for `.summarize()` .. py:method:: summarize(*args, _by=None, **kwargs) Aggregate data with summary statistics :param \*args: Column expressions to add or modify :type \*args: Expr :param by: Columns to group by :type by: str, list :param \*\*kwargs: Column expressions to add or modify :type \*\*kwargs: Expr .. rubric:: Examples >>> df = tp.tibble({'a': range(3), 'b': range(3), 'c': ['a', 'a', 'b']}) >>> df.summarize(avg_a = tp.mean(col('a'))) >>> df.summarize(avg_a = tp.mean(col('a')), ... by = 'c') >>> df.summarize(avg_a = tp.mean(col('a')), ... max_b = tp.max(col('b'))) .. py:method:: tail(n=5, *, _by=None) Alias for `.slice_tail()` .. py:method:: unite(col='_united', unite_cols=[], sep='_', remove=True) Unite multiple columns by pasting strings together :param col: Name of the new column :type col: str :param unite_cols: List of columns to unite :type unite_cols: list :param sep: Separator to use between values :type sep: str :param remove: If True removes input columns from the data frame :type remove: bool .. rubric:: Examples >>> df = tp.tibble(a = ["a", "a", "a"], b = ["b", "b", "b"], c = range(3)) >>> df.unite("united_col", unite_cols = ["a", "b"]) .. py:method:: write_csv(file=None, has_headers=True, sep=',') Write a data frame to a csv .. py:method:: write_parquet(file=str, compression='snappy', use_pyarrow=False, **kwargs) Write a data frame to a parquet .. py:property:: names Get column names .. rubric:: Examples >>> df.names .. py:property:: ncol Get number of columns .. rubric:: Examples >>> df.ncol .. py:property:: nrow Get number of rows .. rubric:: Examples >>> df.nrow .. py:property:: plot Access to polars plotting .. rubric:: Examples >>> df.plot .. py:function:: desc(x) Mark a column to order in descending .. py:function:: as_tibble(x) Convert an object to a tibble :param x: Object to convert to a tibble :type x: [pl.DataFrame, pd.DataFrame, dict] .. rubric:: Examples >>> tp.as_tibble(polars_df) .. py:function:: is_tibble(x) Is an object to a tibble :param x: :type x: object .. rubric:: Examples >>> tp.is_tibble(df) .. py:function:: from_polars(df) Convert from polars DataFrame to tibble :param df: pl.DataFrame to convert to a tibble :type df: DataFrame .. rubric:: Examples >>> tp.from_polars(df) .. py:function:: from_pandas(df) Convert from pandas DataFrame to tibble :param df: pd.DataFrame to convert to a tibble :type df: DataFrame .. rubric:: Examples >>> tp.from_pandas(df)