pdmongo¶
-
pdmongo.read_mongo(collection: str, query: List[Dict[str, Any]], db: Union[str, pymongo.database.Database], index_col: Union[str, List[str], None] = None, extra: Optional[Dict[str, Any]] = None, chunksize: Optional[int] = None) → pandas.core.frame.DataFrame[source]¶ Read MongoDB query into a DataFrame.
Returns a DataFrame corresponding to the result set of the query. Optionally provide an index_col parameter to use one of the columns as the index, otherwise default integer index will be used.
Parameters: - collection (str) – Mongo collection to select for querying
- query (list) – Must be an aggregate query. The input will be passed to pymongo .aggregate
- db (pymongo.database.Database or database string URI) – The database to use
- index_col (str or list of str, optional, default: None) – Column(s) to set as index(MultiIndex).
- extra (dict, optional, default: None) – List of parameters to pass to aggregate method.
- chunksize (int, default None) – If specified, return an iterator where chunksize is the number of docs to include in each chunk.
Returns: Dataframe
-
pdmongo.to_mongo(frame: pandas.core.frame.DataFrame, name: str, db: Union[str, pymongo.database.Database], if_exists: Optional[str] = 'fail', index: Optional[bool] = True, index_label: Union[str, Sequence[str], None] = None, chunksize: Optional[int] = None) → Union[List[pymongo.results.InsertManyResult], pymongo.results.InsertManyResult][source]¶ Write records stored in a DataFrame to a MongoDB collection.
Parameters: - frame (DataFrame, Series)
- name (str) – Name of collection.
- db (pymongo.database.Database or database string URI) – The database to write to
- if_exists ({‘fail’, ‘replace’, ‘append’}, default ‘fail’) –
- fail: If table exists, do nothing.
- replace: If table exists, drop it, recreate it, and insert data.
- append: If table exists, insert data. Create if does not exist.
- index (boolean, default True) – Write DataFrame index as a column.
- index_label (str or sequence, optional) – Column label for index column(s). If None is given (default) and index is True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.
- chunksize (int, optional) – Specify the number of rows in each batch to be written at a time. By default, all rows will be written at once.