Introduction

If you consider data quality a corporate asset, then you will want to use it wisely. That is the role of data governance and it is critical. These guidelines should be made in writing and must be easy to understand. They should also be reviewed regularly to ensure they still meet the needs of the business over time.

Data governance practices will determine the business rules to which your data quality solution will align. For most companies, the responsibility lies within the data governance committee. The commitment to sound data quality and security practices must begin at the top of the organization and include stakeholders at every level. Best practices demonstrate that most data governance committees represent exactly this mix. Data governance is an ongoing commitment.

As company needs change, their data governance policies must be reviewed to ensure alignment. On a day-to-day basis, many organizations are embracing the role of the data steward. This role began in the IT arena, but trends indicate that this role is branching out as the level of accountability entrusted to them increases. After all, it's an expense that doesn't have an associated revenue stream. How can IT or business users or both make a successful pitch for such a project? Perhaps a company had invested millions in a CRM system and was disappointed with the results.

The CRM solution did everything as promised, but the data that it processed was incomplete, outdated, and laden with duplicate records. In that scenario, the ROI was immediate and obvious, but what may be a bit harder to measure were the secondary and tertiary benefits. If you are a financial institution and one of your long-time, high-net-worth clients receives a prospectus asking him to open an account, does that instill trust or frustration?

What we are seeing now are a few trends that have created a more accepting environment among management. How can businesses measure the effectiveness of their current data-governance initiatives? The key is determining what to measure and discerning the threshold values. When an alert is issued, you then have the ability to understand where in the process the conditions went wrong to trip the alert. This requires a feedback loop after the initial measurement. Both policies and tools are needed to discover where the problem occurred.

For example, profiling tools can aid in discovering the source of bad information. Once the source is identified, the issue is then taken into review with the policy to determine how to address it. Typically there are three basic options:. Additionally, an effective data governance initiative with monitoring as described will aid in the overall ROI for the initiative.

What common mistakes do businesses make when managing their data and what steps can they take to avoid them? The biggest mistake is often made by assuming that a sound data quality initiative is a software issue. The reality is, as I mentioned earlier, it requires a commitment from the most senior ranks because it can only succeed with a strong data governance program. When these vendors provide an ROI for their solutions, their assumption is that your data is clean.

Another scenario that is quite common is allowing IT to be the final decision maker regarding the data quality platform purchased. The end user must be considered. What business goals can data-quality initiatives support and what benefits can businesses realize by actively managing their data? SaaS is an effective option. Among our customers, we typically see this to be the preferred option when the client wants to get up and running immediately.

It is also far more cost-effective in a situation whereby the application is for a specific department--for example, the circulation department of a magazine. We do see among some customers a resistance to SaaS. If the resistance stems from a reluctance to abandon their existing investments, it is worth noting that there are strategies for embracing SaaS solutions onto a legacy capability--thus essentially enabling the organization to retain its prior investment while modernizing capabilities in a cost-efficient manner.

What challenges do companies face managing data and how can better data management practices impact the power of other enterprise applications, including business intelligence, CRM, ERP, etc?

Big Data For Small Business Big Data and Data Quality

As I said, a successful data quality endeavor requires commitment at the highest management levels and needs to be supported with a data governance board. As guidelines are put forth, oftentimes changes in business process are required. Another misconception is the importance of data profiling. Just like data cleansing, this needs to be done continually. Strict monitoring must also be enforced. From a technical point of view, many are challenged by data integration or data federation requirements. There many options including those offered by PBBI. What products or services does Pitney Bowes Business Insight offer in the areas of data quality and data governance?

The solution has many unique benefits. Ensuring transparency regarding the fitness for use of these data for analytic studies is fundamental to the responsible utilization of EHR data. The primary objective of this collaborative was to create a data quality assessment DQA framework and guidelines specific to the CER community [ 33 ].

Uniting the many efforts dedicated to validating EHR data [ 27 , 34 , 35 , 36 , 37 , 38 , 39 ], this collaborative has developed and published a harmonized DQ terminology [ 27 ] as well as standards for reporting DQA results [ 11 ], yet standards for assessing and reporting DQ issues have yet to be thoroughly identified. A number of potential barriers may impede institutional investments in DQA activities. For example, methods for assessing and reporting DQ lack standardization; organizational stakeholder requirements may mandate the utilization of very different tools, reporting methods, and assessment strategies than are used by professional researchers aiming to answer specific clinical questions.

Individual and organizational priorities or constraints also may impact how DQ is assessed and how findings are reported. Attempts to understand barriers to performing DQA and reporting DQ findings remain largely unexplored; to the best of our knowledge, there is no existing literature examining DQA and reporting practices employed in the field nor any work which investigates the barriers to performing DQA analytics and reporting DQA results.

The primary goal of this work was to gain insight into the current state of DQA conducting and reporting practices in the field of biomedical informatics. To accomplish our goal, we utilized a multi-phase mixed-methods approach. Phases 2 and 3 were developed using insight gained from the previous phases.

Question and Answer: Data Quality Best Practices | Transforming Data with Intelligence

The first stakeholder engagement meeting was held in Washington, D. Qualitative methods were used to moderate the meeting. A researcher led the meeting using a discussion guide developed by the study team. The meeting collected recommended additions and changes to a proposed DQA framework and guidelines specific to the CER community and discussed limitations and implications of conducting DQA and reporting DQ results.

Discussion amongst stakeholder meeting attendees was digitally recorded following consent from all meeting participants. Data were analyzed using qualitative content methods [ 40 ] and reflexive team analysis which emphasizes inclusion of emergent rather than a priori themes. In July-August , several sites engaged in DQA and reporting were contacted and asked if they would be willing to host a site visit and be interviewed regarding their current practices.

Questions were pilot tested with biomedical professional researchers with past DQA experience and reviewed for applicable content, clarity and completion time. All interviews were conducted one-on-one and took approximately 30 minutes to complete. Detailed notes were taken during interviews and all participants were re-contacted after the initial interview and offered the opportunity to clarify their responses.

To protect the privacy of the participating personnel and sites, names and locations were anonymized. Interview responses were aggregated within each site. Current DQA analytics and reporting practices at each of the four sites were evaluated descriptively. Iterative thematic content analysis was performed on the interview notes. A survey was developed using the findings from Phases 1 and 2. Given the sensitivity around potential consequences of reporting negative DQA findings it was important to provide a mechanism that encouraged participants to answer questions honestly.

For these reasons, an anonymous survey was utilized. Prior to administration, questions were pilot tested with biomedical professional researchers with past DQA experience and reviewed for applicable content, clarity and completion time. In addition to questions about individual and organizational barriers and DQA conducting and reporting solutions described below, participants were asked four questions regarding demographics, five questions regarding current employment, and eight questions about current DQA practices. Based on findings from Phases 1 and 2, questions about individual and organizational barriers to conducting and reporting DQAs as well as potential solutions to these barriers were developed.

The organizational barriers questionnaire was created by modifying items from a questionnaire created to assess the barriers of implementing quality management in service organizations in Pakistan [ 43 ]. The same five- point Likert scale was used to examine agreement to nine potential organizational barriers. Higher scores indicated a greater perceived individual or organizational barrier. Higher scores indicated greater perceived organizational support for conducting and reporting DQAs.

Additionally participants were asked to provide any other solutions they felt would support conducting and reporting of DQAs. Participants between the ages of who currently work with data as a producer i. No identifying information e. Following the Dillman method of survey research, participants received up to five reminder emails; participants were emailed once a week for up to five weeks starting from the date the initial email was sent [ 44 ].

Interested participants were first presented with information about the survey and those who gave consent then proceeded to complete the rest of the survey. Analysis was performed using SAS software version 9. Univariate statistics were used to examine the frequencies of responses to survey questions. Exploratory factor analysis, with principle components analysis as the method of factor extraction and varimax rotation, was used to explore the individual and organizational barriers scales for correlated variables that reflected an underlying factor structure in the data.

Items that loaded strongly on a factor were averaged into a subscale for each participant; this method was selected over analyzing factor scores as it allowed the subscale values to be interpreted using the Likert scale of the original item responses. Analysis of Variance ANOVA was used to compare the overall individual and organizational barriers scales, as well as the subscales derived from factor analysis. The second stakeholder engagement meeting was held in Washington, DC in June of Attendees were similar to the first stakeholder engagement meeting and included: Roughly half of the attendees had participated in Phase 1.

Qualitative methods and group discussion procedures were similar to Phase 1.

Designing with Data by Elizabeth F Churchill, Rochelle King, Caitlin Tan

During the first part of the discussion, the research team asked stakeholders to report if the survey results were consistent or if they conflicted with their individual and organizational work as producers or consumers of data. Discussion amongst stakeholder meeting attendees was collected, recorded, and analyzed using the same procedures outlined for the first stakeholder engagement meeting Phase 1. The meeting attendees identified unintended consequences as a primary barrier to conducting and reporting DQAs.

When discussing solutions that would address DQA barriers, participants recommended enacting guidelines to protect against negative repercussions when reporting DQA findings, developing remediation plans to deal with DQA-related issues, requiring peer-reviewed journals to mandate the inclusion of DQA results with submissions, and identification of DQA resources by institutions and funding agencies.

Importantly, attendees believed that systematically conducting and reporting DQA would require a culture change in expectations regarding how DQAs are conducted, reported, and the interpretation of findings. The findings from this phase provided valuable insight into stakeholder identified DQA barriers.

Interpretation of these findings was limited by a lack of information about the organizations e. Phase 2 was designed to elicit this type of information from well-established private sector and academic organizations currently engaged in this type of work. A total of 19 interviews were completed in four separate site visits. Participating sites included one academic site Site 1: Participants were asked to discuss key DQ dimensions without prompting from a list of DQ elements.

This was done intentionally to reduce confirmation bias during interviews. Sites 2—4 had a standardized remediation plan in place to resolve DQ anomalies. In contrast, an external stakeholder e. Sites were compared according to whether they were managed by internal stakeholders Site 4 , external stakeholders Site 2 , or internal and external stakeholders Sites 1 and 3. Site 2 was externally managed and valued data validity more than any other DQ dimensions.

Site 4 was internally managed and was the only site that conducted commercial research within a large collaborative of independent researchers. Results from the interviews Phase 2 facilitated a general characterization of sites currently conducting DQAs. A general theme that arose across all sites was the notion that DQA barriers and solutions could exist at both the level of the individual as well as the organization.

Phase 3 was designed to elicit more specific information regarding DQA barriers and solutions at both the individual and organizational levels. This assessment also included questions about demographics, employment characteristics, and current DQA practices. Of the participants who provided consent, 30 did not complete the survey and were thus excluded from analysis. Most the sample was white or Asian, middle-aged years old , and highly educated i. Most important and most commonly evaluated aspects of DQ were consistency and completeness of the data. Responses to individual barriers are presented in Appendix A Figure A1.

Over three-fourths of the participants strongly agreed or agreed that an important individual barrier was a lack of resources i. A fear that DQ would invalidate prior research or fear of colleagues leaving collaborations was considered less frequently. Excess layers of interfering management, frequent data owner turn over, and cost of implementing DQA outweighing the benefits, were cited less frequently. Most of the responses to the barrier questions did not significantly differ by participant job characteristics i. The least encouraging solution was professional and financial protections.

The first factor Personal Consequences comprised six items covering serious career consequences as a result of DQA, including invalidating prior work or colleagues leaving collaborations. The second factor Process Issues comprised three items and concerned the implementation of DQA in the analysis pipeline. Darker colored bars represent a stronger loading between a question and a factor. The questions that represent each factor are marked with a star i. The third factor Lack of Resources contained just two items, dealing with a lack of funding, time, or knowledge in conducting DQAs.

Two organizational barriers items were excluded from the factors due to strong cross-loadings. Figure A4 Appendix A shows the agreement level with the overall individual barriers scale and with each of the three factor subscales. The organizational barriers showed values in the neutral-to-agree range for the overall scale and for both factor subscales. To confirm the survey results, responses were queried with stakeholders attending a second stakeholder engagement meeting. From an individual barriers perspective, attendees agreed with survey participants.

They experienced a lack of guidelines and resources for conducting and reporting DQA and added that there was a general feeling of powerlessness to impact the quality of data sets received. Meeting attendees suggested possible solutions to DQA barriers, including the ability to access case studies using data findings that would demonstrate the importance and applicability of DQA to different user types. They also added data source issues as a barrier where there are often difficulties determining the source, or controlling the quality of data received.

Applications for secondary use of EHR data are immense and include everything from investigation of rare and chronic diseases and quality improvement to repurposing of medications and hospital accreditation [ 46 ]. Assessing and reporting the quality of these data will play an important role in determining their utility. Understanding the barriers to conducting and reporting DQA experienced by data professionals may focus efforts to alleviate current barriers and ultimately could increase trust in secondary use of EHR data, the sharing of DQA results within organizations, and the pursuit of personalized medicine.

To the best of our knowledge, this exploration is the first formalized attempt to understand barriers to performing DQA and reporting DQ results. Themes of consequences of reporting DQA findings and support for DQA and reporting were consistent across all participants and phases. The findings from each phase are discussed in reference to each of these themes below. The idea of unintended consequences resulting from reporting negative DQA findings was first discussed during the initial stakeholder engagement meeting conducted during Phase 1.

Meeting attendees were concerned about personal as well as organizational consequences that could result from reporting poor DQA. Conversely survey participants Phase 3 were much less concerned about potential negative personal, professional or institutional reputation-related consequences of reporting poor DQA results. The mixed findings related to consequences of reporting poor DQA between the different phases is interesting and warrants future investigation.

These results imply that the consequences of reporting negative DQA findings may be more complex than initially hypothesized, operating on a level other than the individual or organization. This specific idea was not thoroughly examined in the current work. While this work is the first to provide empirical evidence suggesting that reputation-related punitive consequences are a barrier to DQA within the field of biomedical informatics, the fear of reporting negative findings is not a new phenomenon within the medical field.

TDWI Membership

A study examining adverse event reporting in hospitals found that people were unlikely to report an error due to reluctance to accuse oneself and the fear of malpractice suits [ 47 ]. While these studies investigated different populations than the current study the reporting of the fear of punitive consequences was present across all phases of the current project.


  • Stay ahead with the world's most comprehensive technology and business learning platform..
  • Basics of Buying and Selling Real Estate;
  • How to shape your Kids Better.
  • 1. Introducing a Data Mindset - Designing with Data [Book]!
  • The New Testament and Ethics: A Book-by-Book Survey.

Investigating effective solutions from these different fields may be beneficial for reducing these types of barriers within the domain of clinical research informatics. For example, a study investigating the frequency of medical error reporting by pharmacists working in an inpatient setting, found that pharmacists who felt they could openly communicate were 40 percent more likely to have reported a medical error within the past year [ 48 ].

Similar to attendees of the first stakeholder meeting Phase 1 , survey participants Phase 3 agreed that a strong potential barrier to DQA and reporting was a lack of adequate funding and time to perform DQAs. They also agreed that a lack of guidelines around desirable versus undesirable DQA results was a likely barrier, both to defining DQ issues and to designing appropriate DQ action plans. Although limited to a few settings, there is some initial evidence to support the effect of resources on DQA practices found from the key personnel interviews Phase 2.

Specifically, the sites utilizing customized programs and tools for conducting and reporting DQAs Sites 2 and 4 were those who had multiple employees with dedicated time for DQA. Potential solutions to DQA barriers identified by the survey group mirrored those identified by attendees at both the first and second stakeholder engagement meetings Phases 1 and 4. Solutions included the need for organizational resources and support as well as established standards and processes for conducting DQAs to help data handlers determine whether DQA and reporting guidelines have been met.

While attendees and participants from all phases of the project mentioned wanting better guidelines for conducting DQA, a stronger theme heard from participants was the idea of needing a significant culture change within their data community or their organization. This finding suggests that the DQA community should develop and adopt an infrastructure that standardizes and facilitates the conducting and reporting of DQAs. Several efforts are currently making progress in this area.

Recent work by Kahn et al. The framework provides a set of guidelines for characterizing the DQ of a data source to determine its fitness for use. The same authors recently proposed a harmonized DQA terminology in an effort to encourage the standardization of different DQ characteristics [ 49 ]. Integrating the harmonized DQA terminology with the reporting framework proposed by Kahn et al.

Finally, Callahan et al. This work provides one mechanism to foster community alignment towards systematically conducting DQA work by encouraging collaboration between organizations and individuals, regardless of how mature their current DQA processes are. The current study has important limitations. First, we were unable to perform extensive pilot testing of the interview and survey questions using a similar population of participants.

This type of testing is important as it can be used to identify potential issues in the development of the survey e. Second, scale items for the survey were developed through literature reviews, expert discussions, and the modification of other measures. While these are reliable sources, the survey has yet to receive formal validation or testing. Finally the individual and organizational barriers and solutions were based on hypothetical scenarios; the degree to which they reflect actual DQA and reporting practices is uncertain. Using feedback from the expert discussions, the survey items should be modified and validated.

Results from the current study can be used to explore ways to incorporate needs assessment into a pragmatic DQA and reporting plan. Also, a set of common practices for individuals and organizations not currently implementing DQ checks, but who want to implement DQA and reporting practices, should be drafted. An initial set of recommendations has been published [ 11 ]. Finally, since DQ issues can arise at different stages of data use e. This study is the first of its kind, facilitating an in-depth examination of DQA practices with specific focus on individual and organizational barriers to conducting and reporting DQAs.

The results from this survey facilitated the identification of several individual and organizational barriers as well as helped to identify solutions. The results of this work can be used to inform the development of DQA and reporting standards as well as provide recommendations for clinicians, clinical researchers, and organizations intending to leverage health data sources in need of DQ evaluation. Without the feedback and insight provided by the participants, this study would not have been possible. Additionally we would like to thank AcademyHealth for hosting both of these meetings.

We would also like to thank Dr Carsten Gorg who helped proofread the survey prior to administration. Finally, we would like to thank everyone who helped with respondent recruitment as well as the anonymous respondents who completed the survey. The three columns of numbers under the Factor Loading heading represent the loading values for each question onto each of the identified factors shown above. Please use the following link to download a PDF of the Survey: