Trade Classification With Python¶

Documentation βοΈ: https://karelze.github.io/tclf/
Source Code π: https://github.com/KarelZe/tclf
tclf is a scikit-learn-compatible implementation of trade classification algorithms to classify financial markets transactions into buyer- and seller-initiated trades.
The key features are:
- Easy: Easy to use and learn.
- Sklearn-compatible: Compatible to the sklearn API. Use sklearn metrics and visualizations.
- Feature complete: Wide range of supported algorithms. Use the algorithms individually or stack them like LEGO blocks.
- DataFrame-agnostic: Works with any narwhals-compatible DataFrame, including pandas, Polars, and cuDF.
Installation¶
pip
pip install tclf
uv add tclf
Supported Algorithms¶
- (Rev.) CLNV rule1
- (Rev.) EMO rule2
- (Rev.) LR algorithm6
- (Rev.) Tick test5
- Depth rule3
- Quote rule4
- Tradesize rule3
For a primer on trade classification rules visit the rules section π in our docs.
Minimal Example¶
Let's start simple: classify all trades by the quote rule and all other trades, which cannot be classified by the quote rule, randomly.
tclf accepts any narwhals-compatible DataFrame β pandas, Polars, cuDF, and more β as well as plain numpy arrays.
Polars
import polars as pl
from tclf.classical_classifier import ClassicalClassifier
X = pl.DataFrame(
{
"trade_price": [1.5, 2.5, 1.5, 2.5, 1.0, 3.0],
"bid_ex": [1.0, 1.0, 3.0, 3.0, None, None],
"ask_ex": [3.0, 3.0, 1.0, 1.0, 1.0, None],
}
)
clf = ClassicalClassifier(layers=[("quote", "ex")], strategy="random")
clf.fit(X)
probs = clf.predict_proba(X)
pandas
import numpy as np
import pandas as pd
from tclf.classical_classifier import ClassicalClassifier
X = pd.DataFrame(
[
[1.5, 1, 3],
[2.5, 1, 3],
[1.5, 3, 1],
[2.5, 3, 1],
[1, np.nan, 1],
[3, np.nan, np.nan],
],
columns=["trade_price", "bid_ex", "ask_ex"],
)
clf = ClassicalClassifier(layers=[("quote", "ex")], strategy="random")
clf.fit(X)
probs = clf.predict_proba(X)
Run your script with
$ python main.py
The parameter layers=[("quote", "ex")] sets the quote rule at the exchange level and strategy="random" specifies the fallback strategy for unclassified trades.
Advanced Example¶
Often it is desirable to classify both on exchange level data and nbbo data. Also, data might only be available as a numpy array. So let's extend the previous example by classifying using the quote rule at exchange level, then at nbbo and all other trades randomly.
import numpy as np
from sklearn.metrics import accuracy_score
from tclf.classical_classifier import ClassicalClassifier
X = np.array(
[
[1.5, 1, 3, 2, 2.5],
[2.5, 1, 3, 1, 3],
[1.5, 3, 1, 1, 3],
[2.5, 3, 1, 1, 3],
[1, np.nan, 1, 1, 3],
[3, np.nan, np.nan, 1, 3],
]
)
y_true = np.array([-1, 1, 1, -1, -1, 1])
features = ["trade_price", "bid_ex", "ask_ex", "bid_best", "ask_best"]
clf = ClassicalClassifier(
layers=[("quote", "ex"), ("quote", "best")], strategy="random", features=features
)
clf.fit(X)
acc = accuracy_score(y_true, clf.predict(X))
"ex") and nbbo data ("best"). We set the layers parameter to layers=[("quote", "ex"), ("quote", "best")] to classify trades first on subset "ex" and remaining trades on subset "best". Additionally, we have to set ClassicalClassifier(..., features=features) to pass column information to the classifier.
Like before, column/feature names must follow our naming conventions.
Other Examples¶
For more practical examples, see our examples section.
Development¶
We are using tox with uv for development.
tox -e lint
tox -e format
tox -e test
tox -e build
Citation¶
If you are using the package in publications, please cite as:
@software{bilz_tclf_2024,
author = {Bilz, Markus},
license = {BSD 3},
month = feb,
title = {{tclf} -- trade classification with python},
url = {https://github.com/KarelZe/tclf},
version = {0.3.0},
year = {2024}
}
Footnotes¶
-
Chakrabarty, B., Li, B., Nguyen, V., & Van Ness, R. A. (2007). Trade classification algorithms for electronic communications network trades. Journal of Banking & Finance, 31(12), 3806β3821. https://doi.org/10.1016/j.jbankfin.2007.03.003 ↩
-
Ellis, K., Michaely, R., & OβHara, M. (2000). The accuracy of trade classification rules: Evidence from Nasdaq. The Journal of Financial and Quantitative Analysis, 35(4), 529β551. https://doi.org/10.2307/2676254 ↩
-
Grauer, C., Schuster, P., & Uhrig-Homburg, M. (2023). Option trade classification. SSRN Working Paper. https://doi.org/10.2139/ssrn.4098475 ↩↩
-
Harris, L. (1989). A day-end transaction price anomaly. The Journal of Financial and Quantitative Analysis, 24(1), 29β37. https://doi.org/10.2307/2330746 ↩
-
Hasbrouck, J. (1988). Trades, quotes, inventories, and information. Journal of Financial Economics, 22(2), 229β252. https://doi.org/10.1016/0304-405X(88)90070-0 ↩
-
Lee, C., & Ready, M. J. (1991). Inferring trade direction from intraday data. The Journal of Finance, 46(2), 733β746. https://doi.org/10.1111/j.1540-6261.1991.tb02683.x ↩