Gopu R Potty, Chair, Technology Committee on Data Analytics, Integration and Modeling
The focus of the Technology Committee on Data Analytics, Integration and Modeling encompasses all aspects of data processing including data assimilation, data presentation, database design, filtering, modeling, and analysis. Additional focus areas associated with this committee include all activities and products associated with computer-oriented modeling, simulation, and databases within ocean engineering and science. Key research topics of focus for this committee include data fusion, computational intelligence, artificial intelligence and machine learning and visualization tools, among many others.
The field of Machine Learning (ML) has grown explosively during the past decade and has found applications in a large number of fields. This growth was fueled in part by the availability of large amounts of data and increases in computational capability. The power of ML algorithms come from their ability to learn from the data. This learning process can be supervised or un-supervised (or semi-supervised).
ML approaches are generally divided into two categories: supervised and un-supervised learning. Supervised learning uses labelled data (input-output pairs) to train ML algorithms. A properly trained ML algorithm can then be used to predict the output using new inputs. Care must be taken to use un-biased and reliable training data in order to avoid introducing bias to the learning due to the supervision. A simple example of supervised learning is regression. In unsupervised learning the ML algorithm is trained using un-labelled data to recognize patterns and structures within the data. To find patterns and structures within the data, techniques such as clustering and dimensionality reduction (Principal Component Analysis for example) are used. Recent ML advancements include Generative Adversarial Networks (GANs) and Reinforcement Learning.
ML has transformed data rich fields, such as the commercial sectors, and has a relatively late entry into scientific disciplines. Data, especially in ocean related fields, are highly resource intensive to collect and therefore costly, often difficult to measure and hence are scarce compared to commercial fields. Moreover, the physical variables often exhibit complex non-stationary patterns that change with time. A large training data set is then required for the learning process. Another limitation of application of ML approaches to physical sciences is it’s ‘black-box’ nature, which prevents the discovery of physical cause-effect relationships between variables and of understanding the underlying processes.
Theory based models perform well in scientific problems that are conceptually well understood using known scientific principles. These models will perform poorly in complex problems where the underlying processes are not well understood or when the models involve too many simplifying assumptions. On the other hand, ML algorithms mainly depend on the information contained in the data without relying on any scientific principles. Data scarcity will be the limiting factor in the case of complex scientific problems for these algorithms. Attempts to marry these two contrasting approaches by using theory and data in a synergistic manner has recently gained momentum, which led to the development of “theory guided data science” approaches. This idea is also often referred to as “physics guided ML,” “physics informed ML,” or “physics aware ML.”
ML has been applied (often in combination with physics based models) to a variety of problems in marine science such as the discovery of climate patterns, habitat modelling, forecasting sea level fluctuations, wind and wave modelling, automatic detection and tracking of objects underwater, coastal water monitoring, detection of oil spill and pollution, geoacoustic parameter estimation, etc.
The Technical Committee on Data Analytics, Integration and Modeling is planning a series of webinars on the application of ML techniques to various ocean-related problems this year. We hope that these webinars will spur the use of this powerful tool in your research and other applications.