Breast cancer is one of the most common causes of death in women worldwide and statistics reveals that India ranks top in the world in breast cancer deaths. This alarming fact necessitates early detection and treatment of the disease. Development of automated systems to process and analyze regional breast cancer data would help the medical experts to understand the severity of the disease and the nature of the patient population.
Most of the Medical reports in India are hard copies with natural language descriptions. Processing natural language text has many challenges as it requires handling of varied reporting formats, language style, and diversified representation of facts within the content. A few specific processing requirements include handling of typographical errors, abbreviations, variations in spellings of words and variations in the representation of numerical values and measures.
In recent years, hospitals in India are turning to electronic means of data collection.
Compared to processing medical reports from developed countries, processing of medical reports from hospitals in India has additional challenges. Developed countries use standard medical coding systems such as SNOMED and ICD-O in their reporting, which provides a common ground for analysis and inference. In India, hospital reporting systems do not use medical coding, which makes processing of electronic health records a challenging task. In addition to this, medical data are rarely available
