The column name is lost after the empty dataframe filter row
A function that follows lines in conditional Filter pandas.
works fine for non-empty dataframe, but for empty dataframe, Filter with column name
, empty datafram missing column name
the problem is repeated as follows:
In [5]: t1 = pd.DataFrame(columns=["a","b"])
In [6]: t2=pd.DataFrame({"a":[-1,1],"b":[2,3]})
In [7]: t1
Out[7]:
Empty DataFrame
Columns: [a, b]
Index: []
In [8]: t2
Out[8]:
a b
0 -1 2
1 1 3
In [13]: def myfunc1(row):
...: if row.empty:
...: print(row)
...: return True
...: if int(row["a"])>0:
...: return True
...: else:
...: return False
...:
In [17]: t2[t2.apply(myfunc1, axis=1)]
Out[17]:
a b
1 1 3
In [18]: t1[t1.apply(myfunc1, axis=1)]
Series([], dtype: float64)
Out[18]:
Empty DataFrame
Columns: []
Index: []
T2 results Filter gets a new dataframe,
of row ["a"] > 0
but why did T1 lose columns after Filter?
because columns, is used in subsequent processing, I want to know
Why did you lose columns
because the internal filter conditions of T1 and br T2 are different. The internal condition of < T1 > T2 is actually the second line.
In [18]: t2.apply(myfunc, axis=1)
Out[18]:
0 False
1 True
dtype: bool
In [19]: t2[t2.apply(myfunc, axis=1)]
Out[19]:
a b
1 1 3
The internal condition of
T1 is different. No True,False, is an empty DataFrame
.
In [20]: t1.apply(myfunc, axis=1)
Out[20]: Series([], dtype: float64)
In [21]: t1[t1.apply(myfunc, axis=1)]
Series([], dtype: float64)
Thank you @ everfigt
still don't know how to lose colums. The case of
T2 is easy to understand, that is, it doesn't make sense to press true/false Filter, that is to say, to regard empty dataframe as an internal condition, right?
but there are results when I try to use empty dataframe directly:
In [13]: t1[pd.DataFrame()] -sharp
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-13-c296ac7af466> in <module>()
----> 1 t1[pd.DataFrame()]
~/programs/venv36/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
2133 return self._getitem_array(key)
2134 elif isinstance(key, DataFrame):
-> 2135 return self._getitem_frame(key)
2136 elif is_mi_columns:
2137 return self._getitem_multilevel(key)
~/programs/venv36/lib/python3.6/site-packages/pandas/core/frame.py in _getitem_frame(self, key)
2217 if key.values.size and not is_bool_dtype(key.values):
2218 raise ValueError('Must pass DataFrame with boolean values only')
-> 2219 return self.where(key)
2220
2221 def query(self, expr, inplace=False, **kwargs):
~/programs/venv36/lib/python3.6/site-packages/pandas/core/generic.py in where(self, cond, other, inplace, axis, level, errors, try_cast, raise_on_error)
6128 other = com._apply_if_callable(other, self)
6129 return self._where(cond, other, inplace, axis, level,
-> 6130 errors=errors, try_cast=try_cast)
6131
6132 @Appender(_shared_docs['where'] % dict(_shared_doc_kwargs, cond="False",
~/programs/venv36/lib/python3.6/site-packages/pandas/core/generic.py in _where(self, cond, other, inplace, axis, level, errors, try_cast)
5889 for dt in cond.dtypes:
5890 if not is_bool_dtype(dt):
-> 5891 raise ValueError(msg.format(dtype=dt))
5892
5893 cond = cond.astype(bool, copy=False)
ValueError: Boolean array expected for the condition, not float64