The purpose of this project is to create a medical database within a mainframe environment. In order to accomplish this task, we will be gathering readily available data from government regulated websites. This data must be in the form of a .CSV file, and will be transferred to a mainframe environment through FTP. Meanwhile, we will simultaneously be creating a logical model of our database using ERwin Data Modeling software. From here we will use DDL to create a DB2 mainframe database to store our acquired data. Finally, we will use SQL queries to analyze this database to look for interesting trends, patterns, and statistics, and ultimately use FTP to bring those results out of the mainframe environment into a user-friendly “front-end” software.
Monday-Thursday 10:00am-12:00pm / 1:00pm-3:00pm in room S-243
- Professor Alan S. Eliscu — aeliscu (at) bergen.edu
June, 04 2019
Introduced to the basic concept of data analytics using mainframe and Extract Transform Load (ETL) data integration as well as the projects vision and mission.
Searched and gathered EMR (Electronic Medical Records). Learned the process required to request large medical databases offered specifically for researchers by a variety of national medical societies and institutions.
Using available material and parts we needed to find a solution to the logistical problem of working with large files in a limited network with no centralization. Thus we created LOLA (Logistical Output Load Array) which serves both as a VM server ( Linux, Windows, Android, Mac OS, etc) and as a network driver where we centralize all out work, keep it updated easily and back it up.
Learning the basics of the ETL process, and database architecture from Professor Eliscu. This is with the prospect of organizing data that could have a parallel key made, when loaded into the mainframe for queries. The organization of the data, however, is a greater workload than the physical formatting, because of the size of our data and being able to properly identify important data involving biometrics.
Continued prior research, looking for relevant and usable data files. Researched options for the ETL process and for data analysis, with the aim to automate both the file transfer on and off the mainframe, as well as the data organization. The best possible option being a program called Tableau, as it has compatibility with the mainframe and file transfer ability for DB2 databases, as well as the IBM Cloud platform.
And so it begins! This week we began organizing our workflow and planning our research process. We consolidated our previously found data in a single save location within LOLA, as well as determined what additional data we would need. We found several holes within our process, and began to explore alternative solutions. Specifically, we needed to find better data files to work with that could produce the type database we hoped to build.