preview

Boosted Decision Tree Essay

Decent Essays

\chapter{Multivariate Analysis For Particle Identification}

Multivariate data analysis and machine learning become a useful tool in high-energy physics. The need of more sophisticated data analysis algorithms arose with the increased complexity of the classification problem.
In T2K, selecting a neutrino interaction event is like picking the needle from the haystack, due to the tiny neutrino cross-section and a large number of background events.
Nevertheless, increasing the selection purity and efficiency is crucial for precision measurement of neutrino cross-section.

In this thesis, a machine learning algorithm called Boosted Decision Tree (BDT) is used as a particle identification (PID) classifier.
Information gained from the ND280 …show more content…

To illustrate this idea, ~\cref{fig:MVA_KIT} shows the signal and the background distribution for two measured variables, var0 and var1, of a toy example.
Using tradition cuts on var0 or var1 will result in very poor efficiency. However, visualising the two-dimensional distribution of var0 and var1, one can find a better decision boundary to separate signal from background.
\begin{figure}[H]
\centering
\includegraphics[scale = 0.55]{./Include/MVAdv.jpg}
\caption{Single and multivariate cut effects on correlated data. Signal (in blue) and background (in red) normalised probability distribution for var0 and var1 of a toy example are shown at the left and the centre plots respectively.
A better decision boundary, using variables, correlation, is shown (in green) in the right plot. Figure courtesy of ~\cite{MVA_KIT}. } \label{fig:MVA_KIT}
\end{figure}
The usage of variables correlation increases the efficiency and purity of the selection.
It may be possible to visualise such relationship for two- or three-dimensional problems, yet, a computer algorithm will be needed to optimise the decision boundary in higher-dimensional feature spaces.

\section{Event Classification}
Each event, signal or background, has ``D'' measured variables that construct a D-dimensional feature space, for instance, the features used in the positron selection are given in \cref{table:BDT_InputVariables}.
A machine learning algorithm is a map from the D-dimensional

Get Access