@ru_deep_learning

« Назад

Страница 48 из 52

Далее »

Yuri

06.07.2018
10:25:17

Looks like a very good dataset!

k

06.07.2018
10:26:00

Would you please make me a little clear about timestep definition here in my case

Yuri

06.07.2018
10:26:01

So, basically, you could read DeepSpeech2 for an idea of what architecture you can take

You can group them, yes, with some conv layer on the front

Google

Yuri

06.07.2018
10:26:41

Or can pass them individually

Or you can pass them in parallel but without a conv layer, but for a LSTM it'll be a little be harder to figure out all connections in the data

k

06.07.2018
10:28:03

I already used the conv1d with lstm and even individually lstm but im stocked in very basic question. Here that what is the timestep here for me

Yuri

06.07.2018
10:28:40

For most packages you need to flatten both dimensions into one then

But other than this, feels right

Like, (T*F, batchsize)

k

06.07.2018
10:31:06

For most packages you need to flatten both dimensions into one then

Let say i have only one lable to not need to flatten it

Like, (T*F, batchsize)

So it mean (timestep*featuresize,batchsize)? Right?

Yuri

06.07.2018
10:34:43

Yes

Also, if you would pass the data individually and use a custom metric with checks every 40 steps, it won't have any benefit over this grouped solution, I guess.

k

06.07.2018
10:42:16

So it mean (timestep*featuresize,batchsize)? Right?

So my question is what is timestep here should i set it 40

Also, if you would pass the data individually and use a custom metric with checks every 40 steps, it won't have any benefit over this grouped solution, I guess.

What do you mean exactly it's not clear to me

Yuri

06.07.2018
10:46:08

So my question is what is timestep here should i set it 40

Yes. Packs of 40 timesteps loaded at once should work the best

Google

k

06.07.2018
10:51:11

So then what do you suggest for intera person policy? It seems that model cannot find common pattern through different people

Someone suggested me to split everyone's sinal in 10second for instand and shuffel thi bucket of 10sec then feed the network with a larg batches

But here problem is even in stateless model of lstm the internal state of peaple are kept within a batch, thus if we have a sequence like seq1,seq2,...seqn which are randomly shuffeled and each seq can be blonged to any arbitrary person the lstm considers them as seq1 as the history of seq2 and so on which os not correct

Arcady

06.07.2018
11:16:31

there must be a "clear" option in the lib u r using

k

06.07.2018
11:20:30

there must be a "clear" option in the lib u r using

Would you please explain more

Arcady

06.07.2018
11:21:34

consider batch = 1 sample. then you can reset the state of lstm after training it on each sample.

k

06.07.2018
11:24:03

consider batch = 1 sample. then you can reset the state of lstm after training it on each sample.

Yeah you r right but batchsize 1 did not give a good result so i enlarged the batch size but it was one suggestion do have any other idea in sense of crossvalidation policy between or within people?

Arcady

06.07.2018
11:25:59

a completely different method is e.g. in case of machine translation/text classification is to put a "termination symbol" at the end of each text sample.

k

06.07.2018
11:27:22

Since the temporal dependency exist within data for each people a simple suffeling wont work so i tried the walkfroward ,... but still not rubost enough

a completely different method is e.g. in case of machine translation/text classification is to put a "termination symbol" at the end of each text sample.

So if i use such a termination symbol in my data then how the lstm understand to reset the model state within a batch

Arcady

06.07.2018
11:28:55

Yeah you r right but batchsize 1 did not give a good result so i enlarged the batch size but it was one suggestion do have any other idea in sense of crossvalidation policy between or within people?

even within batches of many samples the network is activated with your data. after activation with each sample the state must be reset. it all depends on your library how one does it

k

06.07.2018
11:29:33

Im using keras with tf backend

Arcady

06.07.2018
11:29:40

So if i use such a termination symbol in my data then how the lstm understand to reset the model state within a batch

this is a separate approach. the network is supposed to sooner or later understand what terminatin symbol means.

Im using keras with tf backend

strange. doesn't "Stateful" equal False in tf by deafult?

k

06.07.2018
11:35:38

even within batches of many samples the network is activated with your data. after activation with each sample the state must be reset. it all depends on your library how one does it

Yeah i tried to have a callback within batch to reset the model after each sample but it seems that keras do not give callback within batch or per sample as far as i know , thus im open to any other solution for cross validation and generalization do you have any suggestion?

Arcady

06.07.2018
11:36:00

apart from using another lib no :)

k

06.07.2018
11:37:56

strange. doesn't "Stateful" equal False in tf by deafult?

Yes it works in both cases statefull or stateless but even in stateless it keeps state within one batch and reset automatically for next batch , the problem is if you have sample from different people in each batch

Yuri

06.07.2018
15:41:07

Yes it works in both cases statefull or stateless but even in stateless it keeps state within one batch and reset automatically for next batch , the problem is if you have sample from different people in each batch

Things to check: 1) Maybe your metric isn't good enough for learning a useful representation so network doesn't generalize well? Maybe there's not enough different people data in the dataset? 2) Maybe there is much noise in the data or the labels are very noisy?

There should be no problem of having multiple people data in a batch.

Also, regarding learning stateful behavior -- you can try to sample data: instead of providing each 40 timesteps, give only 10 or 5 or 1. Or instead of measuring every second, do 5x sampling for the data and average 5 labels into 1 (*corrected). This might prevent your kind of overfitting/generalization failure that comes from learning local behaviour instead of global. Also it's important to remember that there are cases and tasks when a LSTM layer doesn't help much because there's not much reliable correlation in global behaviour.

Google

Yuri

06.07.2018
15:50:54

Imagine that you would do the same as you do for EEG but for the sounds that people make: Once per second measure a sound they pronounce, and attempt to find correlation between sounds made 1 second ago, 2 seconds ago, 3 seconds ago and now. On that timeframe, the only thing that is the same is a person's timbre, and it's different for each person -- exactly what you found in your dataset!

k

06.07.2018
15:55:16

Imagine that you would do the same as you do for EEG but for the sounds that people make: Once per second measure a sound they pronounce, and attempt to find correlation between sounds made 1 second ago, 2 seconds ago, 3 seconds ago and now. On that timeframe, the only thing that is the same is a person's timbre, and it's different for each person -- exactly what you found in your dataset!

So you mean that a small time frame does not have enough data to charactrize desire pattern in this case and it should be a longer time frame? Or i missunderstood?

Yuri

06.07.2018
15:57:10

well, in few words, rather I don't understand how LSTM could help you at all.

Let's talk in terms of second FFT transform: at what frequencies your learnable signal mostly is?

k

06.07.2018
15:58:57

Also, regarding learning stateful behavior -- you can try to sample data: instead of providing each 40 timesteps, give only 10 or 5 or 1. Or instead of measuring every second, do 5x sampling for the data and average 5 labels into 1 (*corrected). This might prevent your kind of overfitting/generalization failure that comes from learning local behaviour instead of global. Also it's important to remember that there are cases and tasks when a LSTM layer doesn't help much because there's not much reliable correlation in global behaviour.

Here as i undestand you wanna say the data frequecy is too much i should downsample it and beside that instead of using only one time frame it might be better to merge several neighber time frame and then feed the network am i right?

Yuri

06.07.2018
15:59:02

Again, for a comparison with sounds, frequencies longer than 1 Hz are of no interest.

k

06.07.2018
16:00:36

Again, for a comparison with sounds, frequencies longer than 1 Hz are of no interest.

Uhu so i was understang wrong my sampling rate is 1khz

Yuri

06.07.2018
16:00:46

Here as i undestand you wanna say the data frequecy is too much i should downsample it and beside that instead of using only one time frame it might be better to merge several neighber time frame and then feed the network am i right?

These are two alternatives to consider: learning each second of data independently (like if there's no generalizable signal in frequencies longer than 1Hz), and the second alternative -- to try how those frequencies are useful, making a coarse grain for your time data.

Uhu so i was understang wrong my sampling rate is 1khz

That's before the first FFT. And I believe you said that you have then frames of 40 ms, and that you have a label once per second or so.

aha, no, you have a label for each 40 ms and data for each 1 ms.

k

06.07.2018
16:04:30

well, in few words, rather I don't understand how LSTM could help you at all.

Would you suggest your preference model instead of lstm? Im not rigid for it and i can try other model as well im usi g rnn based because it's more match with sequence based data

Yuri

06.07.2018
16:05:13

CNN I think. And a FFT maybe before a CNN.

k

06.07.2018
16:06:26

aha, no, you have a label for each 40 ms and data for each 1 ms.

Yes its like (40sample x, 2 lables y)

actually i have used some calculated feature on this 40 ms time frame like FFT and used them instead of raw data and i have implied even Conv1d for them and eventually i saw when i mix it with lstm it gaves me a slightly better result

Yuri

06.07.2018
16:09:53

Also consider different preprocessing and find out the best. You can try a fixed model like CNN+FC layers and compare different preprocessing before it first of all. For the sounds, scientists found the best preprocessing a long time ago and training NN to reproduce it as its part isn't rational because this makes the learning much slower and much more data is needed to learn it.

k

06.07.2018
16:17:22

Also consider different preprocessing and find out the best. You can try a fixed model like CNN+FC layers and compare different preprocessing before it first of all. For the sounds, scientists found the best preprocessing a long time ago and training NN to reproduce it as its part isn't rational because this makes the learning much slower and much more data is needed to learn it.

That sounds nice to check at first the local pattern matching and then maybe implying temporal model for temporal pattern??

Yuri

06.07.2018
16:17:50

yeah, indeed

so a good research would look like finding a good combination of the parameters for the following: 0) choosing a baseline performance for your study (and optionally perform baseline analysis) 1) preprocessing (no FFT, FFT: hann/hamming, frame size, window size, overlap) 2) architecture: finding the best architecture. 3) learning the possible reasons of overfitting and measuring their impact into the final quliaty. 4) "theoretical maximum" quality, probably a kind of analysys of the data variance across people, data noise (maybe by trying to soften the data) and label noise (how often similar data leads to different labels). You can take a small part of dataset for most of these studies, so the network would train very fast (in several minutes on modern GPUs).

I'd also suggest to take initial values for all parts from other people's works.

k

06.07.2018
16:23:08

Thanks a lot yuri im thinking now how can i combine raw data and calculated feature at the same time i mean some cnn with several filters for raw data and some domain related fetures like heart rate at same time is it possible or do you recomend it?

Google

Yuri

06.07.2018
19:41:29

yes, absolutely. you can approximate heart beats with a linear, a cosine or an exponential decaying function, I think.

Evgeniy

07.07.2018
05:29:18

yes, absolutely. you can approximate heart beats with a linear, a cosine or an exponential decaying function, I think.

if your heartbeat is exponentially decaying, you've got some problems

k

07.07.2018
07:50:12

if your heartbeat is exponentially decaying, you've got some problems

Would you please explain more

Yuri

07.07.2018
10:07:15

Would you please explain more

In what format do you want to add the heartbeat values? Are they boolean? A voice channel? Are they postprocessed?

k

07.07.2018
11:18:48

In what format do you want to add the heartbeat values? Are they boolean? A voice channel? Are they postprocessed?

also it is the description in the dataset about them: "- features_ECG/*.arff arff files containing features computed from the filtered ECG signal with a sliding centred window which size depends on the modality (arousal -> ws=4s, valence -> ws=10s; optimised on the dev partition). Features are thus provided separately for each of those two dimensions. The first feature vector is assigned to the center of the window, and duplicated for the previous frames - from 1 to ws/(2*sampling_period)-1, with the sampling period being equal to 40ms. The last feature vector is also duplicated for the last frames. 19 features x 7501 frames per file"

this is the Raw data structure :

if I want to feed this data to the model what approach do you suggest is better? 1)scaling all the data for instance in range of [-1,1] 2) or instead, using a Batchnormalization layer as a input layer to model without any data scaling?

Yuri

07.07.2018
12:57:41

if I want to feed this data to the model what approach do you suggest is better? 1)scaling all the data for instance in range of [-1,1] 2) or instead, using a Batchnormalization layer as a input layer to model without any data scaling?

You can try both, I think it won't give much difference. In ML there are tools to predict feature importance, you might try those on a dataset first before spending too much time on the details of individual features preprocessing.

Andrey

07.07.2018
14:53:28

> tools to predict feature importance https://github.com/marcotcr/lime (https://github.com/thomasp85/lime) as an example. It can be used for DL models, too (see original paper https://arxiv.org/abs/1602.04938)

k

07.07.2018
15:36:30

> tools to predict feature importance https://github.com/marcotcr/lime (https://github.com/thomasp85/lime) as an example. It can be used for DL models, too (see original paper https://arxiv.org/abs/1602.04938)

thank you so much Andrey

Roman

07.07.2018
17:02:11

Добрый вечер! Ищу человека, который бы отвечал за deep learning сторону проекта Сейчас в команде я - отвечаю почти за все вопросы и немного за разработку и андроид-разработчик Если кому-то будет интересно, пишите в лс

k

09.07.2018
20:14:49

Hi again everybody My new question is: Let say we have a sychology related experiment to collect data and each person are asked for giving rating feedback. As usuall in almost all sychology domains people have quiet different rates based on their understanding, mood, knowledge,... If we want to use these feedbacks as the label for any classification or regression of some dataset we will face a wide veriety of user's rate for every specific rating question and it makes problem that we wont have a reliable ground truth just by simply using the average of rate through all rater. We need a weighted avege that gives higher weights to more correlated and simillar rates. So could you please guide me how to define such a inter rater reliability coefficent ?? Or do you have other suggestion?

Specially when the feedbacks are not categorical and they are a real number between range [a,b]

There exist Cohen Kappa statistic but it's only for categorical data and i dont know any others

Диер

13.07.2018
14:09:20

Hello guys. I'm fairly new to reinforcement learning. I have implemented DQN in the past and now I'm working on A3C for a custom environment. And I noticed that in DQN I used an epsilon greedy policy, so I used something like this to force exploration: if eps <= random.random(): return random.randint(0, num_actions-1) else: return np.argmax(model.predict(state)) But in A3C I am using this instead: policy = model.predict(state) return np.random.choice(num_actions, p=policy) As far as I know, this is used to make model conservative about its actions, so we are trying to encourage the model to give a much higher probability (close to 1) for good actions and reduce unpredictability . In A3C we use a critic model to predict value, which is basically a n-step return (expected reward for future n steps) right? But the question is why do we use different approaches? Can I use epsilon greedy policy in A3C or vise versa? Which one is better and when? Or is there certain type of environment which requires to use one of them? And what if my environment is impossible to predict (I mean the future reward), but it is possible to develop a strategy that can beat the game. Let's say, it is a game where you start from a random point and never know what obstacle will come out, but you know for sure that you have to avoid them. Do I have to predict the value then?

k

19.07.2018
19:43:09

Hi every body As you know it's common to reuse pretrained DNN Model for most of the Image processing project in the preliminary layeres, Im looking for such a transfer learning strategy for ecg or eeg signal processing to speed up the process. Do you know any existing model to refer me there? or do have any comment or idea for such a decision? do recommend it? Thanks.

Michael

20.07.2018
20:52:02

ребята, подскажите насчет speech recognition: я там вижу разные виды audio processing: mfcc, filter banks, including delta+ delta-delta. Получается очень разный размер инпута: от (timesteps, 13) with mfcc, до (timesteps, 39) или даже (timesteps, 161) for linear spectrograms. Это все для LibriSpeech на DeepSpeech моделях.

Konstantin