The dataset viewer is not available for this dataset.
Cannot get the config names for the dataset.
Error code:   ConfigNamesError
Exception:    DataFilesNotFoundError
Message:      No (supported) data files found in microsoft/timewarp
Traceback:    Traceback (most recent call last):
                File "/src/services/worker/src/worker/job_runners/dataset/config_names.py", line 73, in compute_config_names_response
                  config_names = get_dataset_config_names(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/inspect.py", line 347, in get_dataset_config_names
                  dataset_module = dataset_module_factory(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/load.py", line 1873, in dataset_module_factory
                  raise e1 from None
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/load.py", line 1854, in dataset_module_factory
                  return HubDatasetModuleFactoryWithoutScript(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/load.py", line 1245, in get_module
                  module_name, default_builder_kwargs = infer_module_for_data_files(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/load.py", line 595, in infer_module_for_data_files
                  raise DataFilesNotFoundError("No (supported) data files found" + (f" in {path}" if path else ""))
              datasets.exceptions.DataFilesNotFoundError: No (supported) data files found in microsoft/timewarp

Need help to make the dataset viewer work? Open a discussion for direct support.

Timewarp datasets

This dataset contains molecular dynamics simulation data that was used to train the neural networks in the NeurIPS 2023 paper Timewarp: Transferable Acceleration of Molecular Dynamics by Learning Time-Coarsened Dynamics by Leon Klein, Andrew Y. K. Foong, Tor Erlend Fjelde, Bruno Mlodozeniec, Marc Brockschmidt, Sebastian Nowozin, Frank Noé, and Ryota Tomioka. Please see the accompanying GitHub repository.

This dataset consists of many molecular dynamics trajectories of small peptides (2-4 amino acids) simulated with an implicit water force field. For each protein two files are available:

  • protein-state0.pdb: contains the topology and initial 3D XYZ coordinates.
  • protein-arrays.npz: contains trajectory information.

The datasets are are split into the following directories:

2AA-1-big "Two Amino Acid" data set

This folder contains a data set of all-atom molecular dynamics trajectories for 380 of the 400 dipeptides, i.e. small proteins composed of two amino acids. This dataset was orginally created missing 20 of the 400 possible dipeptides. The 2AA-1-complete dataset completes this by including all 400. Each peptide is simulated using classical molecular dynamics and the water is simulated using an implicit water model. The trajectories are only saved every 10000 MD steps. There is no intermediate spacing as for the other datasets for the Timewarp project.

2AA-1-complete "Two Amino Acid" data set

This folder contains a data set of all-atom molecular dynamics trajectories for all 400 dipeptides, i.e. small proteins composed of two amino acids. This includes also the peptides missing in the other 2AA datasets. Each peptide is simulated using classical molecular dynamics and the water is simulated using an implicit water model.

4AA-huge "Four Amino Acid" data set, tetrapeptides

This folder contains a data set of all-atom molecular dynamics trajectories for tetrapeptides, i.e. small proteins composed of four amino acids. The data set contains mostly validation and test trajectories as it was mostly used to validation and test purposes. The training trajectories used are usually shorter. Each peptide is simulated for 1 micro second using classical molecular dynamics and the water is simulated using an implicit water model.

4AA-large "Four Amino Acid" data set, tetrapeptides

This folder contains a data set of all-atom molecular dynamics trajectories for 2333 tetrapeptides, i.e. small proteins composed of four amino acids. The data set is split into 1500 tetra-peptides in the train set, 400 in validation, and 433 in test. Each peptide in the train set is simulated for 50ns using classical molecular dynamics and the water is simulated using an implicit water model. Each other peptide is simulated for 500ns.

Responsible AI FAQ

  • What is Timewarp?
    • Timewarp is a neural network that predicts the future 3D positions of a small peptide (2- 4 amino acids) based on its current state. It is a research project that investigates using deep learning to accelerate molecular dynamics simulations.
  • What can Timewarp do?
    • Timewarp can be used to sample from the equilibrium distribution of small peptides.
  • What is/are Timewarp’s intended use(s)?
    • Timewarp is intended for machine learning and molecular dynamics research purposes only.
  • How was Timewarp evaluated? What metrics are used to measure performance?
    • Timewarp was evaluated by comparing the speed of molecular dynamics sampling with standard molecular dynamics systems that rely on numerical integration. Timewarp is sometimes faster than these standard systems.
  • What are the limitations of Timewarp? How can users minimize the impact of Timewarp’s limitations when using the system?
    • As a research project, Timewarp has many limitations. The main ones are that it only works for very small peptides (2-4 amino acids), and that it does not lead to a wall-clock speed up for many peptides.
  • What operational factors and settings allow for effective and responsible use of Timewarp?
    • Timewarp should be used purely for research purposes only.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Downloads last month
4
Edit dataset card